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Intellectual Property Rights 



IPRs essential or potentially essential to the present document may have been declared to ETSI. The information 
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found 
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in 
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web 
server ( http://webapp.etsi.org/IPR/home.asp ). 

Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee 
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web 
server) which are, or may be, or may become, essential to the present document. 



Foreword 

This Technical Specification (TS) has been produced by Joint Technical Committee (JTC) Broadcast of the European 
Broadcasting Union (EBU), Comite Europeen de Normalisation ELECtrotechnique (CENELEC) and the European 
Telecommunications Standards Institute (ETSI). 

Founded in September 1993, the DVB Project is a market-led consortium of public and private sector organizations in 
the television industry. Its aim is to establish the framework for the introduction of MPEG-2 based digital television 
services. Now comprising over 200 organizations from more than 25 countries around the world, DVB fosters 
market-led systems, which meet the real needs, and economic circumstances, of the consumer electronics and the 
broadcast industry. 



Introduction 



The present document addresses the use of video and audio coding in DVB services delivered over IP protocols. It 
specifies the use of H.264/AVC video as specified in ITU-T Recommendation H.264 and ISO/IEC 14496-10 [1], VC-1 
video as specified in SMPTE 42 IM [17], HE AAC v2 audio as specified in ISO/IEC 14496-3 [2], Extended AMR-WB 
(AMR-WB-h) audio as specified in TS 126 290 [12] and AC-3 and Enhanced AC-3 audio as specified in 
TS 102 366 [21]. 

The present document adopts a "toolbox" approach for the general case of DVB applications delivered directly over IP. 
A common generic toolbox is used by all DVB services, where each DVB application can select the most appropriate 
tool from within that toolbox. Annex B of the present document specifies appUcation-specific constraints on the use of 
the toolbox for the particular case of DVB IP Datacast services. 

Clauses 4 to 6 of the present document provide the Digital Video Broadcasting (DVB) specifications for the systems, 
video, and audio layer, respectively. For information, some of the key features are summarized below, but clauses 4 to 6 
should be consulted for all normative specifications: 

Systems: 

• H.264/AVC, VC-1, HE AAC v2, AMR-WB-n, AC-3 and Enhanced AC-3 encoded data is deUvered over IP in 
RTP packets. 

Video: 

The following hierarchical classification of IP-IRDs is specified through Capability categorization of the video codec: 



• 



Capability A IP-IRDs are capable of decoding either bitstreams conforming to H.264/AVC BaseUne profile 
at Level lb with constraint_setl_flag being equal to 1 as specified in [1] or else bitstreams conforming to 
VC-1 Simple Profile at level LL as specified in [17] or else both. 

Capability B IP-IRDs are capable of decoding either bitstreams conforming to H.264/AVC Baseline profile 
at Level 1.2 with constraint_setl_flag being equal to 1 as specified in [1] or else bitstreams conforming to 
VC-1 Simple Profile at level ML as specified in [17] or else both. 
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Capability C IP-IRDs are capable of decoding either bitstreams conforming to H.264/AVC Baseline profile 
at Level 2 with constraint_setl_flag being equal to 1 as specified in [1] or else bitstreams conforming to VC-1 
Advanced Profile at level LO as specified in [17] or else both. 

Capability D IP-IRDs are capable of decoding either bitstreams conforming to H.264/AVC Main profile at 
level 3 as specified in [1] (and optionally capable of decoding bitstreams conforming to H.264/AVC High 
profile at level 3 as specified in [1]) or else bitstreams conforming to VC-1 Advanced Profile at level LI as 
specified in [17] or else both. 

Capability E IP-IRDs are capable of decoding either bitstreams conforming to H.264/AVC High profile at 
level 4 as specified in [1] or else bitstreams conforming to VC-1 Advanced Profile at level L3 as specified 
in [17] or both. 

IP-IRDs labelled with a particular capability Y are also capable of decoding H.264/AVC and/or VC-1 
bitstreams that can be decoded by IP-IRDs labelled with a particular capability X, with X being an earlier 
letter than Y in the alphabet. For instance, a Capability D IP-IRD that is capable of decoding bitstreams 
conforming to Main Profile at level 3 of H.264/AVC will additionally be capable of decoding H.264/AVC 
bitstreams that are also decodable by IP-IRDs with capabilities A, B or C. 

It is possible that an IP-IRD may support the decoding of H.264/AVC at Capability M and VC-1 at Capability 
N where M and N are not the same. 



Audio: 



• IP-IRDs are capable of decoding either bitstreams conforming to MPEG-4 Audio HE AAC v2 Profile, or else 
bitstreams conforming to AMR-WB-n, or else bitstreams conforming to AC-3, or else bitstreams conforming to 
Enhanced AC-3, or any combination of the four. 

• Sampling rates between 8 kHz and 48 kHz are supported by IP-IRDs. 

• IP-IRDs support mono, parametric stereo (when MPEG-4 Audio HE AAC v2 Profile is used) and 2-channel 
stereo; support of multi -channel is optional. 

An IP-IRD of one of the capability classes A to E above meets the minimum functionality, as specified in the present 
document, for decoding H.264/AVC or VC-1 video and for decoding HE AAC v2, AMR-WB-n, AC-3 or Enhanced 
AC-3 audio delivered over an IP network. The specification of this minimum functionality in no way prohibits IP-IRD 
manufacturers from including additional features, and should not be interpreted as stipulating any form of upper limit to 
the performance. 

Where an IP-IRD feature described in the present document is mandatory, the word "shall" is used and the text is in 
italic; all other features are optional. The specifications presented for IP-IRDs observe the following principles: 

• IP-IRDs allow for future compatible extensions to the bit-stream syntax; 

• all "reserved", "unspecified", and "private" bits in H.264/AVC, VC-1, HE AAC v2, AMR-WB-n, AC-3, 
Enhanced AC-3 and IP protocols are ignored by IP-IRDs not designed to make use of them. 

The rules of operation for the encoders are features and constraints which the encoding system should adhere to in order 
to ensure that the transmissions can be correctly decoded. These constraints may be mandatory or optional. Where a 
feature or constraint is mandatory, the word "shall" is used and the text is italic; all other features are optional. 
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Scope 



The present document specifies the use of H.264/AVC, VC-1, HE AAC v2, AMR-WB+, AC-3 and Enhanced AC-3 for 
DVB conforming dehvery in RTP packets over IP networks. The decoding of H.264/AVC, VC-1, HE AAC v2, 
AMR-WB+, AC-3 and Enhanced AC-3 in IP-IRDs is specified as well as rules of operation that encoders must apply to 
ensure that transmissions can be correctly decoded. These specifications may be mandatory, recommended or optional. 

Annex A of the present document provides an informative description for the normative contents of the present 
document and the specified codecs. 

Annex B of the present document defines application-specific constraints on the use of H.264/AVC, VC-1, HE AAC v2 
and AMR-WBh- for DVB IP Datacast services. 
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Definitions and abbreviations 



3.1 Definitions 

For the purposes of the present document, the following terms and definitions apply: 

3GP file: file based on 3GPP file format [11] and its extensions and typically having a .3gp extension in its filename 

bitstream: coded representation of a video or audio signal 

DVB IP datacast application: application that complies with the DVB IP Datacast Umbrella Specification 

IP-IRD: integrated Receiver-Decoder for DVB services delivered over IP categorized by a video decoding and 
rendering capability 

MP4 File: file based on ISO base media file format [24] and its extensions and typically having a .mp4 extension in its 
filename 

multi-channel audio: audio signal with more than two channels 
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streaming delivery session: instance of delivery of a streaming service which is characterized by a start and end time 
and addresses of the IP flows used for delivery of the media streams between start and end time 



3.2 



Abbreviations 



For the purposes of the present document, the following abbreviations apply: 

3GPP Third Generation Partnership Project 

AAC LC Advanced Audio Coding Low Complexity 

AC-3 Dolby AC-3 audio coding system 

ACELP Algebraic Code Excited Linear Prediction 

AMR-WB Adaptive Multi-Rate-WideBand 

AMR-WB+ Extended AMR-WB 

AOT Audio Object Type 

ASO Arbitrary Slice Ordering 

AU Access Unit 

BWE Bandwidth Extension 

CAB AC Context Adaptive Binary Arithmetic Coding 

CIF Common Interchange Format 

DEMUX DeMUltipleXer 

DRC Dynamic Range Control 

DVB Digital Video Broadcasting 

DVB-H DVB-Handheld 

FMO Flexible Macroblock Ordering 

GOP Group of Picture 

H.264/AVC H.264/Advanced Video Coding 

HDTV High Definition Television 

HE AAC High-Efficiency Advanced Audio Coding 

IDR Instantaneous Decoding Refresh 

IP Internet Protocol 

IPDC IP Data Casting 

IRD Integrated Receiver-Decoder 

LC Low Complexity 

LF Low Frequency 

LL Low Level 

MBMS Multimedia Broadcast/Multicast Service 

ML Medium Level 

MPEG Moving Pictures Experts Group (ISO/IEC JTC 1/SC 29/WG 1 1) 

MTU Maximum Transmission Unit 

MUX Multiplexer 

NAL Network Abstraction Layer 

NTP Network Time Protocol 

PCM Pulse-code modulation 

PS Parametric Stereo 

PSS Packet switched Streaming Service 

QCIF Quarter Common Interchange Format 

QMF Quadrature Mirror Filter 

RTCP RTP Control Protocol 

RTP Real-time Transport Protocol 

RTSP Real-Time Streaming Protocol 

SBR Spectral Band Replication 

SDP Session Description Protocol 

SR Sender Report 

TCP Transmission Control Protocol 

TCX Transform Coded Excitation 

UDP User Datagram Protocol 

VC-1 Advanced Video Coding according to SMPTE Standard 421M 

VCEG Video Coding Experts Group (ITU-T SG16 Q.6: Video Coding) 

VCL Video Coding Layer 

VUI Video Usability Information 
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Systems layer 



The IP-IRD design should be made under the assumption that any legal structure as permitted RTP packets may occur, 
even if presently reserved or unused. To allow full upward compatibility with future enhanced versions, a DVB IP-IRD 
shall be able to skip over data structures which are currently "reserved", or which correspond to functions not 
implemented by the IP-IRD. For example, an IP-IRD shall allow the presence of unknown MIME format parameters for 
RFC payloads, while ignoring its meaning. 

Annex B defines application-specific constraints for DVB IP Datacast services. 

4.1 Transport over IP Networks/RTP Packetizations Formats 

When II.264/AVC, VC-1, HEAAC v2, AMR-WB+, AC-3 and Enhanced AC-3 data are transported over IP networks, 
RTP, a Transport Protocol for Real-Time Applications as defined in RFC 3550 [3], shall be used. This clause specifies 
the transport of H.264/AVC, VC-1, HE AAC v2, AMR-WB-n, AC-3 and Enhanced AC-3 in RTP packets for delivery 
over IP networks and for decoding of such RTP packets in the IP-IRD. 

The specification for the use of video and audio coding in broadcasting applications based on the MPEG-2 Transport 
Stream is given in TS 101 154 [7], whilst that for contribution and primary distribution applications is given in 
TS 102 154 [8]. RFC 2250 [6] is used for the transport of an MPEG-2 TS in RTP packets over IP. 

While the general RTP specification is defined in RFC 3550 [3], RTP payload formats are codec specific and defined in 
separate RFCs. The specific formats of the RTP packets are specified in clause 4.1.1 for H.264/AVC, in clause 4.1.2 for 
VC-1, in clause 4.1.3 for HE AAC v2, in clause 4.1.4 for AMR-WB-n, in clause 4.1.5 for AC-3 and in clause 4.1.6 for 
Enhanced AC-3. 

4.1 .1 RTP packetizations of H.264/AVC 

For transport over IP, the H.264/AVC data is packetized in RTP packets using RFC 3984 [5]. 

Encoding: RFC 3984 [5] shall be used for packetization into RTP. 

Decoding: An IP-IRD that supports H.264/AVC shall be able to receive RTP packets with H.264/AVC data as 

defined in RFC 3984 [5]. 

4.1 .2 RTP packetization of VC-1 

For transport over IP, the VC-1 data is packetized in RTP packets using RFC 4425 [18]. 

Encoding: RFC 4425 /"187 shall be used for packetization into RTP. 

Decoding: An IP-IRD that supports VC-1 shall be able to receive RTP packets with VC-1 data as defined in 

RFC 4425 [18]. 

4.1 .3 RTP packetization of HE AAC v2 

For transport over IP, the HE AAC v2 data is packetized in RTP packets using RFC 3640 [4]. 

Encoding: RFC 3640 [4] shall be used for packetization into RTP. 

Decoding: An IP-IRD that supports HE-AAC v2 shall support RFC 3640 [4] to receive HE AAC v2 data 

contained in RTP packets. 

4.1 .4 RTP packetization of AMR-WB+ 

For transport over IP, the AMR-WBh- data is packetized in RTP packets using RFC 4352 [13]. 
Encoding: RFC 4352 [13] shall be used for packetization in RTP. 
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Decoding: An IP-IRD that supports AMR-WB+ shall support /137 to receive AMR-WB+ data contained in 

RTF packets. 

4.1.5 RTP packetization of AC-3 

For transport over IP, the AC-3 data is packetized in RTP packets using RFC 4184 [22]. 
Encoding: RFC 4184 [22] shall be used for packetization in RTF. 

Decoding: An IF-IRD that supports AC-3 shall support A227 to receive AC-3 data contained in RTF packets. 

4.1 .6 RTP packetization of Enlnanced AC-3 

For transport over IP, the Enhanced AC-3 data is packetized in RTP packets using RFC 4598 [23]. 

Encoding: RFC 4598 /"237 shall be used for packetization in RTF. 

Decoding: An IF-IRD that supports Enhanced AC-3 shall support [23] to receive Enhanced AC-3 data 

contained in RTF packets. 

4.2 File storage for download services 
4.2.1 l\/IP4 files 

This clause describes usage of MP4 files based on ISO base media file format [24] in download services supporting this 
feature. 
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Encoding: The MP 4 file shall be created according to the MPEG-4 Part 12 [24] specification with the 

constraints described below. 

Zero or one video track and one audio track shall be stored in the file for default presentation of 

contents. The default video track (if present) shall contain Video Elementary Stream for used 

media format. The default audio track shall contain Audio Elementary Stream for used media 

format. 

The default video track (if present) shall have the lowest track ID among the video tracks stored in 

the file. The default audio track shall have the lowest track ID among the audio tracks stored in 

the file. 

For the default video track (if present) and the default audio track, "Track_enabled" shall be set to 
the value of 1 in the "flags" field of Track Header Box of the track. 

The "moov" box shall be positioned after the "ftyp" box before the first "mdat". If a "moof box is 
present, it shall be positioned before the corresponding "mdat" box. 

Within a track, chunks shall be in decoding time order within the media-data box "mdat". 

Video and audio tracks shall be organized as interleaved chunks. The duration of samples stored 
in a chunk shall not exceed 1 second. 



Decoding: 



If the size of "moov" box becomes bigger than IMbytes, the file shall be fragmented by using moof 
header. The size of "moov" box shall be equal to or less than 1 Mbytes. The size of "moof boxes 
shall be equal to or less than 300 kbytes. 

For video, random accessible samples should be stored as the first sample of each "traf". In the 
case of gradual decoder refresh, a random accessible sample and the corresponding recovery point 
should be stored in the same movie fragment. In case of audio, samples having the closest 
presentation time for every video random accessible sample should be stored as the first sample of 
each "traf. Hence, the first samples of each media in the "moof" have the approximately equal 
presentation times. 

The sample size box ("stsz") shall be used. The compact sample size box ("stz2") shall not be used. 

Only Media Data Box (mdat) is allowed to have size 1. Only the last Media Data Box (mdat) in the 
file is allowed to have size 0. Other boxes shall not have size 1. 

Tracks other than the default video and audio tracks may be stored in the file. 

An IP-IRD that supports this feature shall be able to render the default video track and the default 
audio track stored in the file as described above. The IP-IRD shall also be tolerant of additional 
tracks other than the default video and audio tracks stored in the file. 



4.2.1 .1 MP4 file storage of H.264 video 

H.264 video bitstreams are stored in MP4 files using AVC file format as specified in [25]. 

Encoding: AVC file format [25] shall be used for storing H.264 video tracks in MP4 files. In addition the 

restrictions defined in clause 4.2. 1 shall apply. 

Decoding: An IP-IRD that supports this feature shall support [25] to receive H.264 data contained in MP4 

files. 

4.2.1 .2 MP4 file storage of VC-1 video 

VC-l video bitstreams are stored in MP4 files using SMPTE RP2025 [19]. 

Encoding: SMPTE RP2025 [19] shall be used for storing VC-1 video tracks in MP4 files. In addition the 

restrictions defined in clause 4.2. 1 shall apply. 
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Decoding: An IP-IRD that supports this feature shall support [19] to receive VC-1 data contained in MP4 

files. 

4.2.2 3GP files 

This clause describes usage of 3GPP file format [11] in download services supporting this feature. 

Encoding: The 3GP file shall conform to the Basic profile of the 3GPP Release 6 file format [I I]. 

Decoding: An IP-IRD that supports this feature shall be able to parse Basic profile 3 GP files according to the 

3GPP Release 6 file format specification [I I]. 

4.2.2.1 3GP file Storage of H.264 

The specifications in clause 4.2.2 shall apply. 

4.2.2.2 3GP file storage of VC-1 

VC-l video bitstreams are stored in 3GP files using SMPTE RP2025 [19]. 

Encoding: SMPTE RP2025 [19] shall be used for storing VC-1 video tracks in 3GP files. In addition the 

restrictions defined in clause 4.2.2 apply. 

Decoding: An IP-IRD that supports this feature shall support [19] to receive VC-1 data contained in 3GP 

files. 



Video 



Each IP-IRD shall be capable of decoding either video bitstreams conforming to H.264/AVC as specified in [I] or else 
video bitstreams conforming to VC-1 as specified in [\1] or else both. Clause 5.1 describes the guidelines for encoding 
with H.264/AVC in DVB IP Network bit-streams, and for decoding this bit-stream in the IP-IRD. Clause 5.2 describes 
the guidelines for encoding with VC-1 in DVB IP Network bit-streams, and for decoding this bit-stream in the IP-IRD. 
Annex B specifies application-specific constraints on the use of H.264/AVC and VC-1 for DVB IP Datacast services. 

5.1 H.264/AVC 

This clause describes the guidelines for H.264/AVC video encoding and for decoding of H.264/ A VC data in the 
IP-IRD. 

The bitstreams resulting from H.264/AVC encoding shall conform to the corresponding profile specification in [I]. The 
IP-IRD shall allow any legal structure as permitted by the specifications in [I] in the encoded video stream even if 
presently "reserved" or "unused". 

To allow full compliance to the specifications in [1] and upward compatibility with future enhanced versions, an 
IP-IRD shall be able to skip over data structures which are currently "reserved", or which correspond to functions not 
implemented by the IP-IRD. 

5.1.1 Profile and level 

Encoding: Capability A H.264/AVC Bitstreams shall conform to the restrictions described in 

ITU-T Recommendation H.264/ISO/IEC 14496-10 [I] for Level lb of the Baseline Profile with 
constraint _set 1 Jlag being equal to 1. In addition, in applications where decoders support the 
Main or the High Profile, the bitstream may optionally conform to these profiles. 

Capability B H.264/AVC Bitstreams shall conform to the restrictions described in 
ITU-T Recommendation H.264/ISO/IEC 14496-10 [\] for Level 1.2 of the Baseline Profile with 
constraint _setl Jlag being equal to 1. In addition, in applications where decoders support the 
Main or the High Profile, the bitstream may optionally conform to these profiles. 
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Capability C H.264/AVC Bitstreams shall conform to the restrictions described in 
ITU-T Recommendation H.264/ISO/IEC 14496-10 [\] for Level 2 of the Baseline Profile with 
constraint _set 1 Jlag being equal to 1. In addition, in applications where decoders support the 
Main or the High Profile, the bitstream may optionally conform to these profiles. 

Capability D H.264/AVC Bitstreams shall conform to the restrictions described in 
ITU-T Recommendation H.264 ISO/IEC 14496-10 [\] for Level 3 of the Main Profile. In addition, 
in applications where decoders support the High Profile, the bitstream may optionally conform to 
the High Profile. 

Capability E H.264/AVC Bitstreams shall conform to the restrictions described in 
ITU-T Recommendation H.264/ISO/IEC 14496-10 [\] for Level 4 of the High Profile. 

Decoding: Capability A IP-IRDs that support H.264/AVC shall be capable of decoding and rendering 

pictures using Capability A H.264/AVC Bitstreams. Support of the Main Profile and other profiles 
beyond Baseline Profile with constraint_setl_flag equal to 1 is optional. Support of levels beyond 
Level lb is optional. 

Capability B IP-IRDs that support H.264/AVC shall be capable of decoding and rendering 
pictures using Capability A and B H.264/AVC Bitstreams. Support of the Main Profile and other 
profiles beyond Baseline Profile with constraint_setl_flag equal to 1 is optional. Support of levels 
beyond Level 1 .2 is optional. 

Capability C IP-IRDs that support H.264/AVC shall be capable of decoding and rendering 
pictures using Capability A, B and C H.264/AVC Bitstreams. Support of the Main Profile and 
other profiles beyond Baseline Profile with constraint_setl_flag equal to 1 is optional. Support of 
levels beyond Level 2 is optional. 

Capability D IP-IRDs that support H.264/AVC shall be capable of decoding and rendering 
pictures using Capability A, B, C and D H.264/AVC Bitstreams. Support of the High Profile and 
other profiles beyond Main Profile is optional. Support of levels beyond Level 3 is optional. 

Capability E IP-IRDs that support H.264/AVC shall be capable of decoding and rendering 
pictures using Capability A, B, C, D and E H.264/AVC Bitstreams. Support of profiles beyond 
High Profile is optional. Support of levels beyond Level 4 is optional. 

If an IP-IRD encounters an extension which it cannot decode, it shall discard the following data 
until the next start code prefix (to allow backward compatible extensions to be added in the 
future). 

5.1 .2 Video usability information 

It is recommended that the IP-IRD support the use of Video Usability Information of the following syntax elements: 
• Timing information (time_scale, num_units_in_tick, and fixed_frame_rate_flag). 
Picture Structure Information (pic_struct_present_flag). 



• 



• Maximum number of frames that precede any frame in the coded video sequence in decoding order and follow 
it in output order (num_reorder_frames). 

It is recommended that encoders include these fields as appropriate. 

5.1.3 Frame rate 

Encoding: Each frame rate allowed by the applied H.264/AVC Profile and Level may be used. The maximum 

time distance between two pictures should not exceed 0,7 s. 

Decoding: An IP-IRD that supports H.264/AVC shall support each frame rate allowed by the H.264/AVC 

Profile and Level that is applied for decoding in the IP-IRD. This includes variable frame rate. 
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5.1.4 Aspect ratio 



Encoding: Each sample and picture aspect ratio allowed by the applied H.264/AVC Profile and Level may be 

used. It is recommended to avoid very large or very small picture aspect ratios and that those 
picture aspect ratios specified in [7] are used. 

Decoding: An IP-IRD that supports H.264/AVC shall support each sample and picture aspect ratio permitted 

by the applied H.264/AVC Profile and Level. 

5.1.5 Luminance resolution 

Encoding: Each luminance resolution allowed by the applied H.264/AVC Profile and Level may be used. 

Decoding: An IP-IRD that supports H.264/AVC shall support each luminance resolution permitted by the 

applied H.264/AVC Profile and Level. 



5.1.6 CInromaticity 



Encoding: It is recommended to specify the chromaticity coordinates of the colour primaries of the source 

using the syntax elements colour_primaries, transfer_characteristics, and matrix_coefficients in the 
VUI. The use of ITU-R Recommendation BT.709 [20] is recommended. 

Decoding: An IP-IRD that supports H.264/AVC shall be capable of decoding any allowed values of 

colour_primaries, transfer_characteristics, and matrix_coefficients. It is recommended that 
appropriate processing be included for the rendering of pictures. 

5.1.7 CInrominance format 

Encoding: It is recommended to specify the chrominance locations using the syntax elements 

chroma_sample_loc_type_top_field and chroma_sample_loc_type_bottom_field in the VUI. It is 
recommended to use chroma sample type 0. 

Decoding: An IP-IRD that supports H.264/AVC shall be capable of decoding any allowed values of 

chroma_sample_loc_type_top_field and chroma_sample_loc_type_bottom_field. It is 
recommended that appropriate processing be included for the rendering of pictures. 



5.1 .8 Random access points 



5.1.8.1 Definition 

A Random Access Point (RAP)shall be either: 

• an IDR picture; or 

• an I Picture, with an in-band recovery_point SEI message. 
Where the recovery point SEI message is present it shall: 

• have the field exact_match_flag to "1"; 

• have the field recovery_frame_cnt set to a value equivalent to 500 ms or less; 

• only be preceded in the access unit to which it applies by: 

Access_unit_delimiter NAL, if present. 
Buffering_period SEI message, if present. 
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Unless the sequence parameter set and picture parameter set are provided outside the elementary stream, the random 
access point shall include exactly one SPS (that is active), and the PPS that is required for decoding the associated 
picture. Note that an I picture need not necessarily be a Random Access Point. An I picture that is not a Random Access 
Point shall not contain a recovery_point SEI message. 

NOTE: The value of recovery_frame_cnt will impact on critical factors such as channel change performance. 

5.1.8.2 Time Interval between RAPs 

Encoding: The Encoder shall place RAPs (along with associated sequence and picture parameter sets if these 

are not provided outside the elementary stream) in the video elementary stream at least once every 
5 s. It is recommended that RAPs (along with associated sequence and picture parameter sets if 
these are not provided outside the elementary stream) occur on average at least every 2 s. Where 
channel change times are important it is recommended that RAPs (along with associated sequence 
and picture parameter sets if these are not provided outside the elementary stream) occur more 
frequently, such as every 500 ms. 

In systems where time-slicing is used, it is recommended that each time-slice begins with a 
random access point. 

NOTE 1 : Decreasing the time interval between RAPs may reduce channel hopping time and improve trick modes, 
but may reduce the efficiency of the video compression. 

NOTE 2: Having a regular interval between RAPs may improve trick mode performance, but may reduce the 
efficiency of the video compression. 

5.1 .9 Sequence parameter sets and picture parameter sets 

When changing syntax elements of sequence or picture parameter sets, it is recommended to use different values for 
seq_parameter_set_id or pic_parameter_set_id from the previous active ones, as per ISO/IEC 14496-10 [1]. 

5.2 VC-1 

This clause describes the guidelines for VC-1 video encoding and for decoding of VC-1 data in the IP-IRD. 

The bitstreams resulting from VC-1 encoding shall conform to the corresponding profile specification in [11]. The 
IP-IRD shall allow any legal structure as permitted by the specifications in [\1] in the encoded video stream even if 
presently "reserved" or "unused". 

To allow full compliance to the specifications in [17] and upward compatibility with future enhanced versions, an 
IP-IRD shall be able to skip over data structures which are currently "reserved", or which correspond to functions not 
implemented by the IP-IRD. 

5.2.1 Profile and level 

Encoding: Capability A VC-1 Bitstreams shall conform to the restrictions described in SMPTE 421M [17] for 

Simple Profile at level LL. 

Capability B VC-1 Bitstreams shall conform to the restrictions described in SMPTE 421M [11] for 
Simple Profile at level ML. 

Capability C VC-1 Bitstreams shall conform to the restrictions described in SMPTE 421M [11] for 
Advanced Profile at level ID. 

Capability D VC-1 Bitstreams shall conform to the restrictions described in SMPTE 421M [11] 
for Advanced Profile at level LI . 

Capability E VC-1 Bitstreams shall conform to the restrictions described in SMPTE 421M [11] for 
Advanced Profile at level L3. 

Decoding: Capability A IP-IRDs that support VC-1 shall be capable of decoding and rendering pictures 

using Capability A VC-1 Bitstreams. Support of additional profiles and levels is optional. 
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Capability B IP-IRDs that support VC-1 shall be capable of decoding and rendering pictures 
using Capability A and B VC-1 Bitstreams. Support of additional profiles and levels is optional. 

Capability C IP-IRDs that support VC-1 shall be capable of decoding and rendering pictures 
using Capability A, B and C VC-1 Bitstreams. Support of additional profiles and levels is optional. 

Capability D IP-IRDs that support VC-1 shall be capable of decoding and rendering pictures 
using Capability A, B, C and D VC-1 Bitstreams. Support of additional profiles and levels is 
optional. 

Capability E IP-IRDs that support VC-1 shall be capable of decoding and rendering pictures using 
Capability A, B, C, D and E VC-1 Bitstreams. Support of additional profiles and levels is optional. 

If an IP-IRD encounters an extension which it cannot decode, it shall discard the following data 
until the next start code prefix (to allow backward compatible extensions to be added in the 
future). 

5.2.2 Frame rate 

Encoding: Each frame rate allowed by the applied VC-1 Profile and Level may be used. The maximum time 

distance between two pictures should not exceed 0,7 s. 

Decoding: An IP-IRD that supports VC-1 shall support each frame rate allowed by the VC-1 Profile and 

Level that is applied for decoding in the IP-IRD. This includes variable frame rate. 

5.2.3 Aspect ratio 

Encoding: Each sample and picture aspect ratio allowed by the applied VC-1 Profile and Level may be used. 

It is recommended to avoid very large or very small picture aspect ratios and that those picture 
aspect ratios specified in [7] are used. 

Decoding: An IP-IRD that supports VC-1 shall support each sample and picture aspect ratio permitted by the 

applied VC-1 Profile and Level. 

5.2.4 Luminance resolution 

Encoding: Each luminance resolution allowed by the applied VC-1 Profile and Level may be used. 

Decoding: An IP-IRD that supports VC-1 shall support each luminance resolution permitted by the applied 

VC-1 Profile and Level. 



5.2.5 Cinromaticity 



Encoding: It is recommended to specify the chromaticity coordinates of the colour primaries of the source 

using the syntax elements COLOR_PRIM, TRANSFER_CHAR and MATRIX_COEF, if these 
syntax elements are allowed by the applied VC-1 Profile. 

For Advanced Profile, the use of ITU-R Recommendation BT.709 [20] is recommended (video 
source corresponding to COLOR_PRIM, TRANSFER_CHAR and MATRIX_COEF field values 
equalto"l", "1", "1"). 

For Simple and Main Profile, the default value for the COLOR_PRIM, TRANSFERjCHAR and 
MATRIX_COEF field values shall be "6", "6", "6" for video sources originating from a 29.97 
frame/s system and shall be "5", "5", "6" for video sources originating from a 25 frame/s system. 

Decoding: An IP-IRD that supports VC-1 shall be capable of decoding any allowed values of COLOR_PRIM, 

TRANSFERjCHAR and MATRIXjCOEF. It is recommended that appropriate processing be 
included for the rendering of pictures. 
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5.2.6 Random access points 



Encoding: Where channel change times are important it is recommended that a Sequence Header and Entry 

Point Header are encoded at least once every 500 ms, if these syntax elements are allowed by the 
applied VC-1 Profile. In applications where channel change time is an issue but coding efficiency 
is critical, it is recommended that a Sequence Header and Entry Point Header are encoded at least 
once every 2 s, if these syntax elements are allowed by the applied VC-1 Profile. For those 
applications where channel change time is not an issue, it is recommended that a Sequence Header 
and Entry Point Header are sent at least once every 5 s, if these syntax elements are allowed by the 
applied VC-1 Profile. 

In systems where time-slicing is used, it is recommended that each time-slice begins with a 
Sequence Header and Entry Point Header, if these syntax elements are allowed by the applied 
VC-1 Profile. 

NOTE 1 : Increasing the frequency of Sequence Header and Entry Point Header will reduce channel hopping time 
but will reduce the efficiency of the video compression. 

NOTE 2: Having a regular interval between Entry Point Headers may improve trick mode performance, but may 
reduce the efficiency of the video compression. 



6 Audio 

Each IP-IRD shall be capable of decoding either audio bitstreams conforming to HE AAC v2 as specified in 
ISO/IEC 14496-3 [2] or else audio bitstreams conforming to Extended AM R-WB (AMR WB+) as specified in 
TS 126 290 [\2] or else audio bitstreams conforming to AC-3 or Enhanced AC-3 as specified in TS 102 366 [2\] or any 
combination of the four. Clause 6.1 describes the guidelines for encoding with MPEG-4 AAC, MPEG-4 HE AAC 
profile and MPEG HE AAC v2 profile and for decoding this bit-stream in the IP-IRD. Clause 6.2 describes the 
guidelines for encoding with AMR-WBh- and for decoding this bit-stream in the IP-IRD. Clause 6.3 describes the 
guidelines for encoding with AC-3 and for decoding this bit-stream in the IP-IRD. Clause 6.4 describes the guidelines 
for encoding with Enhanced AC-3 and for decoding this bit-stream in the IP-IRD. Annex B specifies 
application-specific constraints on the use of HE AAC v2 and AMR-WBh- for DVB IP Datacast services. 

The recommended level for reference tones for transmission is 18 dB below clipping level, in accordance with EBU 
Recommendation R.68 [9]. 

6.1 MPEG-4 AAC profile, l\/IPEG-4 HE AAC profile and IVIPEG 
HE AAC v2 profile 

For HE AAC, the audio encoding shall conform to the requirements defined in ISO/IEC 14496-3 including 
Amendments 1 and 2 [2]. 

For HE AAC v2 the audio encoding shall conform to the requirements defined in ISO/IEC 14496-3 [2] including 
ISO/IEC 14496-3 including Amendments 1 and 2 [2]. 

The IP-IRD design should be made under the assumption that any legal structure as permitted by ISO/IEC 14496-3 
including Amendments 1 and 2 [2] may occur in the broadcast stream even if presently reserved or unused. To allow 
full compliance to ISO/IEC 14496-3 [2] and upward compatibility with future enhanced versions, a DVB IP-IRD shall 
be able to skip over data structures which are currently "reserved", or which correspond to functions not implemented 
by the IP-IRD. For example, an IP-IRD which is not designed to make use of the extension payload shall skip over that 
portion of the bit-stream. 

The following clauses are based on ISO/IEC 14496-3 including Amendments 1 and 2 [2]. 
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6.1.1 Audio mode 



Encoding: The audio shall be encoded in mono, parametric stereo or 2-channel-stereo according to the 

functionality defined in the HE AAC v2 Profile Level 2 or in multi-channel according to the 
functionality defined in the HE AAC v2 Profile Level 4, as specified in ISO/IEC 14496-3 [2]. A 
simulcast of a mono/parametric stereo/stereo signal together with the multi -channel signal is 
optional. 

Decoding: An IP-IRD that supports HE AAC v2 shall be capable of decoding in mono, parametric stereo or 

2-channel-stereo of the functionality defined in the HE AAC v2 Profile Level 2, as specified in 
ISO/IEC 14496-3 [2]. The support of multi -channel decoding in an IP-IRD is optional. 

6.1.2 Profiles 

Encoding: The encoder shall use either the AAC Profile or the HE AAC Profile or the HE AAC v2 Profile. 

Use of the HE AAC v2 Profile is recommended. 

Decoding: An IP-IRD that supports HE AAC v2 shall be capable of decoding the HE AAC v2 Profile. 

6.1.3 Bit rate 

Encoding: Audio may be encoded at any bit rate allowed by the applied profile and selected Level. 

Decoding: An IP-IRD that supports HE AAC v2 shall support any bit rate allowed by the HE AAC v2 Profile 

and selected Level. 



6.1.4 Sampling frequency 



Encoding: Any of the audio sampling rates of the HE AAC v2 Profile Level 2 may be used for mono, 

parametric stereo and 2-channel stereo and of the HE AAC v2 Profile Level 4 for multichannel 
audio. 

Decoding: An IP-IRD that supports HE AAC v2 shall support each audio sampling rate permitted by the 

HE AA C v2 Profile Level 2 for mono, parametric stereo and 2-channel stereo and of the 
HE AAC v2 Profile Level 4 for multichannel audio. 

6.1 .5 Dynamic range control 

Encoding: The encoder may use the MPEG-4 AAC Dynamic Range Control (DRC) tool. 

Decoding: An IP-IRD that supports HE AAC v2 shall support the MPEG-4 AAC Dynamic Range Control 

(DRC) tool. 

6.1.6 Matrix downmix 

Decoding: An IP-IRD that supports HE AAC v2 shall support the matrix downmix as defined in MPEG-4. 



6.2 AMR-WB+ 



AMR-WB+ encoding and decoding ofAMR-WB+ data shall follow the guidelines described in this clause and are 
based on TS 126 290 [12]. 

ForAMR-WB+ the audio encoding shall conform to the requirements defined in TS 126 290 ^127. 

6.2.1 Audio mode 

Encoding: The audio shall be encoded in mono or stereo according to the functionality defined in the 

AMR-WB+ A127. 
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Decoding: An IP-IRD that supports AMR-WB+ shall be capable of decoding in mono and stereo the 

functionality defined in the AMR-WB+, as specified in TS 126 290 1127- 

6.2.2 Sampling frequency 

Encoding: Any of the audio sampling rates of the AMR-WB+ may be used for mono and stereo. 

Decoding: An IP-IRD that supports AMR-WB+ shall support each audio sampling rate permitted by the 

AMR-WB+ for mono and stereo. 

6.3 AC-3 

The encoding and decoding of an AC-3 elementary stream shall conform to the requirements defined in 
TS 102 366 [2\] excluding annex E. Annex E specifies the Enhanced AC-3 bitstream syntax. 

6.3.1 Audio mode 

Encoding: The audio shall be encoded in mono, 2-channel-stereo or multi-channel, as specified in 

TS 102 366, [2\] excluding annex E. 

Decoding: An IP-IRD that supports AC-3 shall be capable of decoding to mono, or 2-channel-stereo PCM, as 

specified in TS 102 366, [2\] excluding annex E. Support for decoding to multi-channel PCM in 
an IP-IRD is optional. 

6.3.2 Bit rate 

Encoding: Audio may be encoded at any bit rate listed in TS 102 366 [2\], excluding annex E. 

Decoding: An IP-IRD that supports AC-3 shall support all bit rates listed in TS 102 366 [2\], excluding 

annex E. 

6.3.3 Sampling frequency 

Encoding: Audio may be encoded at any sample rate listed in TS 102 366 [2\], excluding annex E. 

Decoding: An IP-IRD that supports AC-3 shall support all sample rates listed in TS 102 366 [2\], excluding 

annex E. 

6.4 Enhanced AC-3 

The encoding and decoding of an Enhanced AC-3 elementary stream shall conform to the requirements defined in 
TS 102 366 [2\] including annex E. 

6.4.1 Audio mode 

Encoding: The audio shall be encoded in mono, 2-channel-stereo or multi-channel, as specified in 

TS 102 366 [2\]. 

Decoding: An IP-IRD that supports Enhanced AC-3 shall be capable of decoding to mono, or 2-channel- 

stereo PCM, as specified in TS 102 366, [2\]. Support for decoding to multi-channel PCM in an 
IP-IRD is optional. 
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6.4.2 Substreams 

Encoding The Enhanced AC-3 elementary stream shall contain no more than three independent substreams 

in addition to the independent substream containing the main audio programme. The main audio 
programme shall only be delivered in independent substream and dependent substreams 
associated with independent substream 0. All substreams within an Enhanced AC-3 bitstream 
shall be encoded with the same number of audio blocks per syncframe. 

Decoding An IP-IRD that supports Enhanced AC-3 shall be able to accept Enhanced AC-3 elementary 

streams that contain more than one substream. IP-IRDs shall be capable of decoding independent 
substream 0. 

6.4.3 Bit rate 

Encoding: Audio may be encoded at any bit rate up to and including 3 024 kbps. 

Decoding: An IP-IRD that supports Enhanced AC-3 shall support a maximum bit rate of 3 024 kbps. 

6.4.4 Sampling frequency 

Encoding: Audio may be encoded at a sample rate of 32 kHz, 44,1 kHz or 48 kHz. All substreams present in 

an Enhanced AC-3 bitstream shall be encoded at the same sample rate. 

Decoding: An IP-IRD that supports Enhanced AC-3 shall support sample rates of 32 kHz, 44,1 kHz and 

48 kHz. 



6.4.5 Stream mixing 



In some applications, the audio decoder may be capable of simultaneously decoding two different programme elements, 
carried in two separate Enhanced AC-3 elementary streams, or in separate independent substreams within a single 
Enhanced AC-3 elementary stream, and then combining the programme elements into a complete programme. 

Encoding: The elementary stream or independent substream that carries the associated audio services to be 

mixed with the main programme audio shall not contain more audio channels than the main audio 
programme. 

The elementary stream or independent substream carrying the associated audio service shall 
contain mixing metadata, as defined in TS 102 366 [2\], for use by the decoder to control the 
mixing process. 

To match the default user volume adjustment setting in the decoder, the pgmscl field in the 
associated programme elementary stream or independent substream shall be set to a positive 
value of 12 dB. 

A minimum functionality mixer is described in clause E.4 of TS 102 366 [21]. Elementary streams 
or independent substreams intended to be combined together for reproduction according to this 
mixing process shall meet the following constraints: 

The elementary stream or independent substream that carries the associated audio services to 
be mixed with the main programme audio shall contain no more than two audio channels; 

Decoding: If audio access units from two audio services which are to be simultaneously decoded do not have 

identical RTP timestamp values indicated in their corresponding RTP headers (indicating that the 
audio encoding was not frame synchronous) then the audio frames (access units) of the main 
audio service shall be presented to the audio decoder for decoding and presentation at the time 
indicated by the RTP timestamp. An associated service, which is being simultaneously decoded, 
shall have its audio frames (access units), which are in closest time alignment (as indicated by the 
RTP timestamp) to those of the main service being decoded, presented to the audio decoder for 
simultaneous decoding. 

IP-IRDs shall set the default user volume adjustment of the associated programme level to minus 
12 dB. 
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Annex A (informative): 

Description of the implementation guidelines 



A.1 Introduction 



The present document defines how advanced audio and video compression algorithms may be used for all DVB 
services delivered directly over IP protocols without the use of an intermediate MPEG-2 Transport Stream. An example 
of this type of DVB service is DVB-H, using multi-protocol encapsulation. The corresponding guidelines for 
audio-visual coding for DVB services which use an MPEG-2 Transport Stream are given in TS 101 154 [7] for 
distribution services and in TS 102 154 [8] for contribution services. Examples of Transport Stream based DVB service 
are the familiar DVB-S, DVB-C and DVB-T transmissions. 

The "systems layer" of the present document addresses issues related to transport and synchronization of advanced 
audio and video. The systems layer is based on the use of RTP, a generic Transport Protocol for Real-Time 
Applications as defined in RFC 3550 [3]. Use of RTP requires the definition of payload formats that are specific for 
each content format, and so the system layer specifies which RTP payload formats to use for transport of advanced 
audio and video, as well as applicable constraints for that. Further information on the systems layer is given in 
clause A.2. 

The advanced video coding uses either H.264/AVC, as specified in ITU-T Recommendation H.264 [1] and in 
ISO/IEC 14496-10 [1], or else VC-1, as specified in SMPTE 421M [17]. Both algorithms use an architecture based on a 
motion-compensated block transform, like the older MPEG-1 and MPEG-2 algorithms. However, unlike the earlier 
algorithms, they have smaller, dynamically selected block sizes to allow the encoder to represent both large and small 
moving objects more efficiently. They also support greater precision in the representation of motion vectors and use 
more sophisticated variable-length coding to represent the coded information more efficiently. Both algorithms include 
loop filtering to help reduce the visibility of blocking artefacts that may appear when the encoder is highly stressed by 
extremely critical source material. For further information on the video codecs see clause A. 3. 

The advanced audio coding uses either MPEG-4 HE AAC v2 audio, as specified in ISO/IEC 14496-3 [2], or else 
Extended AMR-WB (AMR-WB-n) audio as specified in TS 126 290 [12], or else AC-3 or Enhanced AC-3 audio as 
specified in TS 102 366 [21]. The MPEG-4 HE AAC v2 Profile is derived from the MPEG-2 Advanced Audio Coding 
(AAC), first pubUshed in 1997. MPEG-4 AAC is closely based on MPEG-2 AAC but includes some further 
enhancements such as perceptual noise substitution to give better performance at low bit rates. The MPEG-4 HE AAC 
Profile adds spectral band replication, to allow more efficient representation of high-frequency information by using the 
lower harmonic as a reference. The MPEG-4 HE AAC v2 Profile adds the parametric stereo tool to the MPEG-4 HE 
AAC Profile, to allow a more efficient representation of the stereo image at low bit rates. Extended AMR-WB 
(AMR-WB-h) has been optimized for use at low bit rates with source material where speech predominates. AC-3 is an 
audio coding format designed to encode multiple channels of audio into a low bit rate format. Dolby Digital, which is a 
branded version of AC-3, encodes up to 5. 1 channels of audio. Enhanced AC-3 is a development of AC-3 that improves 
low data rate performance and supports a more flexible bitstream syntax to support new audio services. For further 
information on the audio codecs see clause A.4. 

A wide range of potential applications are covered by the present document, ranging from HDTV services to 
low-resolution services delivered to small portable receivers. A particular example of the latter type of service is the 
DVB IP Datacast application. A common generic toolbox is used by all DVB services, where each DVB application can 
select the most appropriate tool from within that toolbox. Annex B of the specification defines application-specific 
constraints on the use of the toolbox for the particular case of DVB IP Datacast services. For further information on the 
DVB IP Datacast application and the background to the constraints that have been defined, see clause A.5. 
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A.2 Systems 
A.2.1 Protocol stack 

For delivery of DVB Services over IP-based networks a protocol stack is defined in a suite of DVB specifications. The 
systems part the present document addresses only the part of the protocol stack that is related to the transport and 
synchronization of audio and video. This part of the DVB-IP protocol stack is given in Figure A. 1 . For completeness, 
RTCP and RTSP are also included, as they are relevant for RTP usage, though there are no specific guidelines for 
RTCP and RTSP defined in the present document. 



Service offering 



H.264/AVC 
VC-1 



HEAAC v2 
AMR-WB+ 
ACS, E-AC3 



RTP 



RTCP 



RTSP 



UDP 



TCP 



IP 



Physical and data link layers: Ethernet 1394, etc. 



NOTE: Specifications for RTCP and RTSP usage are beyond tine scope of tfie present document. 

Figure A.1 : The part of the DVB-IP protocol stack relevant 
for the transport of advanced audio and video 

The transport of audio and video data is based on RTP, a generic Transport Protocol for Real-Time Applications as 
defined in RFC 3550 [3]. RFC 3550 [3] specifies the elements of the RTP transport protocol that are independent of the 
data that is transported, while separate RFCs define how to use RTP for transport of specific data such as coded audio 
and video. 



A.2.2 Transport of H.264/AVC video 



To transport H.264/VC video data, RFC 3984 [5] is used. The H.264/AVC specification [1] distinguishes conceptually 
between a Video Coding Layer (VCL), and a Network Abstraction Layer (NAL). The VCL contains the video features 
of the codec (transform, quantization, motion compensation, loop filter, etc.). The NAL layer formats the VCL data into 
Network Abstraction Layer units (NAL units) suitable for transport across the applied network or storage medium. A 
NAL unit consists of a one-byte header and the payload; the header indicates the type of the NAL unit and other 
information, such as the (potential) presence of bit errors or syntax violations in the NAL unit payload, and information 
regarding the relative importance of the NAL unit for the decoding process. RFC 3984 [5] specifies how to carry NAL 
units in RTP packets. 



A.2.3 Transport of VC-1 video 



To transport VC-1, RFC 4425 [18] is used. Each RTP packet contains an integer number of Access Units as defined in 
RFC 4425 [18], which are byte-aligned. Each Access Unit (AU) starts with the AU header, followed by a variable 
length payload. The AU payload normally contains data belonging to exactly one VC-1 frame. However, the data may 
be split between multiple AUs if it would otherwise cause the RTP packet to exceed the Maximum Transmission Unit 
(MTU) size, to avoid IP-level fragmentation. 



ETSI 



25 



ETSI TS 102 005 V1.3.1 (2007-07) 



In the VC-1 Advanced Profile, the sequence layer header contains the parameters required to initialize the VC-1 
decoder. These parameters apply to all entry-point segments until the next occurrence of a sequence layer header in the 
coded bit stream. Neither a sequence layer header nor an entry -point segment header is defined for the VC-1 Simple and 
Main Profiles. For these profiles, the decoder initialization parameters are conveyed as Decoder Initialization Metadata 
structures (see annex J of SMPTE 421M [17]) carried in the SDP datagrams signalling the VC-1-based session. 



A.2.4 Transport of HE AAC v2 audio 



To transport HE AAC v2, RFC 3640 [4] is used. RFC 3640 [4] supports both implicit signalling as well as explicit 
signalling by means of conveying the AudioSpecificConfigO as the required MIME parameter "config", as defined in 
RFC 3640 [4]. The framing structure defined in RFC 3640 [4] does support carriage of multiple AAC frames in one 
RTP packet with optional interleaving to improve error resiliency in packet loss. For example, if each RTP packet 
carries three AAC frames, then with interleaving the RTP packets may carry the AAC frames as given in Figure A. 2. 



RTP packets, no interleaving 



RTP pacl^ets, with interleaving 



PI 1 



PI 



No packet loss 



P2 


4 


5 


6 


P^ 7 


8 


9 






1 


P2 


2 


5 


8 


P3LA_ 


6 


9 








2 


3 


4 


5 


6 


7 


8 



Packet loss, no interleaving 



Packet loss, with interleaving 



Figure A.2: Interleaving of AAC frames 

Without interleaving, then RTP packet PI carries the AAC frames 1, 2 and 3, while packet P2 and P3 carry the frames 
4, 5 and 6 and the frames 7, 8 and 9, respectively. When P2 gets lost, then AAC frames 4, 5 and 6 get lost, and hence 
the decoder needs to reconstruct three missing AAC frames that are contiguous. In this example, interleaving is applied 
so that PI carries 1, 4 and 7, P2 carries 2, 5 and 8, and P3 carries 3, 6 and 9. When P2 gets lost in this case, again three 
frames get lost, but due to the interleaving, the frames that are immediately adjacent to each lost frame are received and 
can be used by the decoder to reconstruct the lost frames, thereby exploiting the typical temporal redundancy between 
adjacent frames to improve the perceptual performance of the receiver. 



A.2. 5 Transport of AMR-WB+ audio 



To transport AMR-WB-n, RFC 4352 [13] is used. That payload is used also in both 3GPP Release TS 126 234 [10] and 
TS 126 346 [16] in which AMR-WBh- is the recommended codec with HE AAC v2. 

The framing structure defined in [13] does support carriage of multiple AMR-WBh- frames in one RTP packet with 
optional interleaving to improve error resiliency in packet loss. The overhead due to payload starts from three bytes per 
RTP-packet. The use of interleaving increases the overhead per packet slightly; in minimum 4 bits for each frame in the 
payload (rounded upwards to full bytes in case of odd number of frames). 

For example, if each RTP packet carries three AMR-WBh- frames, then with interleaving the AMR-WB+ packets may 
carry the AMR-WB+ frames as given in Figure A. 3. 
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Figure A.3: Interleaving of AMR-WB+ frames 

Without interleaving, then RTP packet PI carries the AMR-WB+ frames 1, 2 and 3, while packet P2 and P3 carry the 
frames 4, 5 and 6 and the frames 7, 8 and 9, respectively. When P2 gets lost, then AMR-WB+ frames 4, 5 and 6 get lost, 
and hence the decoder needs to reconstruct three missing AMR-WB+ frames that are contiguous. In this example, 
interleaving is applied so that PI carries 1, 4 and 7, P2 carries 2, 5 and 8, and P3 carries 3, 6 and 9. When P2 gets lost in 
this case, again three frames get lost, but due to the interleaving, the frames that are immediately adjacent to each lost 
frame are received and can be used by the decoder to reconstruct the lost frames, thereby exploiting the typical temporal 
redundancy between adjacent frames to improve the perceptual performance of the receiver. 



A.2.6 Transport of AC-3 audio 



To transport AC-3 audio, RFC 4184 [22] is used. The framing structure defined in RFC 4184 [22] supports carriage of 
multiple AC-3 frames in one RTP packet. It also supports fragmentation of AC-3 frames in cases where the frame 
exceeds the Maximum Transmission Unit (MTU) of the network. Fragmentation may take into account the partial frame 
decoding capabilities of AC-3 to achieve higher resilience to packet loss by setting the fragmentation boundary at the 
"5/8ths point" of the frame. 

A.2.7 Transport of enhanced AC-3 audio 

To transport Enhanced AC-3 audio, RFC 4598 [23] is used. The framing structure defined in RFC 4598 [23] supports 
carriage of multiple Enhanced AC-3 frames in one RTP packet. Recommendations for concatenation decisions which 
reduce the impact of packet loss by taking into account the configuration of multiple channels and programs are 
provided. It also supports fragmentation of Enhanced AC-3 frames in cases where the frame exceeds the MTU of the 
network. 

A.2.8 Synchronization of content delivered over IP 

RTP also provides tools for synchronization. For that purpose, an RTP time stamp is present in the RTP header; the 
RTP time stamps are used to determine the presentation time of the audio and video access units. The method to 
synchronize content transported in RTP packets is described RFC 3550 [3]. By means of Figure A.4 a simplified 
summary is given below: 

a) RTP time stamps convey the sampling instant of access units at the encoder. The RTP time stamp is expressed 
in units of a clock, which is required to increase monotonically and linearly. The frequency of this clock is 
specified for each payload format, either explicitly or by default. Often, but not necessarily, this clock is the 
sampling clock. In Figure A.4, TSa(i) and TSv(j) are RTP time stamps that are used to present the access units 
at the correct timing at the receiver; this requires that the receiver reconstructs the video clock and audio clock 
with the same mutual offset in time as at the sender. 
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b) When transporting RTP packets, the RTCP Control Protocol, also defined in RFC 3550 [3], is used for 
purposes such as monitoring and control. RTCP data is carried in RTCP packets. There are several RTCP 
packet types, one of which is the Sender Report (SR) RTCP packet type. Each RTCP SR packet contains an 
RTP time stamp and an NTP time stamp; both time stamps correspond to the same instant in time. However, 
the RTP time stamp is expressed in the same units as RTP time stamps in data packets, while the NTP time 
stamp is expressed in "wallclock" time; see clause 4 of RFC 3550 [3]. In Figure A.4, NTPa(k) and NTPv(n) 
are the NTP time stamps of the audio and video RTCP packets. At(k) and Vt(n) are the values of the audio and 
video clock at the same instant in time as NTPa(k) and NTPv(n), respectively. Each SR(k) for audio provides 
NTPa(k) as NTP time stamp and At(k) as RTP time stamp. Similarly, each SR(n) for video provides NTPv(n) 
as the NTP time stamps and Vt(n) as RTP time stamp. 



Figure A.4: RTP tools for synchronization 

c) Synchronized playback of streams is only possible if the streams use the same wall-clock to encode NTP 
values in SR packets. If the same wall-clock is used, receivers can achieve synchronization by using the 
correspondence between RTP and NTP time stamps. To synchronize an audio and a video stream, one needs to 
receive an RTCP SR packet relating to the audio stream, and an RTCP SR packet relating to the video stream. 
These SR packets provide a pair of NTP timestamps and their corresponding RTP timestamps that is used to 
align the media. For example, in Figure A.4, [NTPv(k) - NTPa(n)] represents the offset in time between Vt(k) 
and At(n), expressed in wallclock time. 

d) The time between sending subsequent RTCP SR packets may vary; the default RTCP timing rules suggest to 
send an RTCP SR packet every 5 s. This means that upon entering a streaming session there may be an initial 
delay - on average a 2,5 s duration if the default RTCP timing rules are used - when the receiver does not yet 
have the necessary information to perform inter-stream synchronization. 

A.2.9 Synchronization with content delivered over l\/IPEG-2 TS 

Applications may require synchronization of audiovisual content delivered over IP with content delivered over an 
MPEG-2 TS. For example, a broadcaster may wish to provide audio in another language as part of a broadcast program, 
but using transport over IP instead of transporting this additional audio stream over the same MPEG-2 TS as the 
broadcast program. 

Synchronization of a stream delivered over IP with a broadcast program requires that the receiver knows the timing 
relationship between the RTP time stamps of the stream that is delivered over IP and the MPEG-2 time stamps of the 
broadcast program. It is beyond the scope of the present document how to convey such timing relationship. 

A.2. 1 Service discovery 

For discovery of DVB services over IP it is referred to the IPI specification for low and mid level (PSI/SI equivalent) 
functionality and to the GBS specification for higher level (Sl/metadata related, except structures and containers) 
functionality. 

A.2.1 1 Linking to applications 

Audio and video delivered over IP can be presented in an MHP application by means of including appropriate URLs. 



A.2.1 2 Capability exchange 



By means of capability exchange protocols the sender and receiver can communicate whether the receiver has A, B, C, 
D or E IP-IRD capabilities for H.264/AVC decoding. In addition, it can also be communicated whether the receiver has 
multi -channel or only mono/stereo capabilities for HE AAC v.2 decoding or whether the receiver supports AMR-WB-n, 
AC-3 or Enhanced AC-3 decoding, and whether decoding of multiple Enhanced AC-3 substreams is supported. For 
capability exchange protocols it is referred to the IPI specification. 
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A.3 Video 



A.3.1 H.264/AVC video 
A.3. 1.1 Overview 

The part of the H.264/AVC standard referenced in the present document specifies the coding of video (in 4:2:0 chroma 
format) that contains either progressive or interlaced frames, which may be mixed together in the same sequence. 
Generally, a frame of video contains two interleaved fields, the top and the bottom field. The two fields of an interlaced 
frame, which are separated in time by a field period (half the time of a frame period), may be coded separately as two 
fields or together as a frame. A progressive frame should always be coded as a single frame; however, it can still be 
considered to consist of two fields at the same instant of time. H.264/AVC covers a Video Coding Layer (VCL), which 
is designed to efficiently represent the video content, and a Network Abstraction Layer (NAL), which formats the VCL 
representation of the video and provides header information in a manner appropriate for conveyance by a variety of 
transport layers or storage media The structure of H.264/AVC video encoder is shown in Figure A. 5. 
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Figure A.5: Structure of H.264/AVC video encoder 

A.3. 1 .2 Network Abstraction Layer (NAL) 

The Video Coding Layer (VCL), which is described below, is specified to efficiently represent the content of the video 
data. The Network Abstraction Layer (NAL) is specified to format that data and provide header information in a manner 
appropriate for conveyance by the transport layers or storage media. All data are contained in NAL units, each of which 
contains an integer number of bytes. A NAL unit specifies a generic format for use in both packet-oriented and 
bitstream systems. The format of NAL units for both packet-oriented transport and bitstream is identical except that 
each NAL unit can be preceded by a start code prefix in a bitstream-oriented transport layer. The NAL facilitates the 
ability to map H.264/AVC VCL data to transport layers such as: 

• RTP/IP for any kind of real-time wire-line and wireless Internet services (conversational and streaming); 

• File formats, e.g. ISO "MP4" for storage and MMS; 

• H.32X for wireline and wireless conversational services; 

• MPEG-2 systems for broadcasting services, etc. 

The full degree of customization of the video content to fit the needs of each particular application was outside the 
scope of the H.264/AVC standardization effort, but the design of the NAL anticipates a variety of such mappings. 

One key concept of the NAL is parameter sets. A parameter set is supposed to contain information that is expected to 
rarely change over time. There are two types of parameter sets: 

• sequence parameter sets, which apply to a series of consecutive coded video pictures; and 

• picture parameter sets, which apply to the decoding of one or more individual pictures. 
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The sequence and picture parameter set mechanism decouples the transmission of infrequently changing information 
from the transmission of coded representations of the values of the samples in the video pictures. Each VCL NAL unit 
contains an identifier that refers to the content of the relevant picture parameter set, and each picture parameter set 
contains an identifier that refers to the content of the relevant sequence parameter set. In this manner, a small amount of 
data (the identifier) can be used to refer to a larger amount of information (the parameter set) without repeating that 
information within each VCL NAL unit. 



A.3.1 .3 Video Coding Layer (VCL) 



The video coding layer of H.264/AVC is similar in spirit to other standards such as MPEG-2 Video. It consists of a 
hybrid of temporal and spatial prediction in conjunction with transform coding. Figure A. 6 shows a block diagram of 
the video coding layer for a macroblock, which consists of a 16x16 luma block and two 8x8 chroma blocks. 
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Figure A.6: Basic coding structure for 1-1.264/ AVC for a macroblocl< 

In summary, the picture is split into macroblocks. The first picture of a sequence or a random access point is typically 
coded in Intra, i.e., without using other information than the information contained in the picture itself. Each sample of 
a luma or chroma block of a macroblock in such an Intra frame is predicted using spatially neighbouring samples of 
previously coded blocks. The encoding process is to choose which and how neighbouring samples are used for Intra 
prediction which is simultaneously conducted at encoder and decoder using the transmitted Intra prediction side 
information. 

For all remaining pictures of a sequence or between random access points, typically Inter coding is utilized. Inter coding 
employs prediction (motion compensation) from other previously decoded pictures. The encoding process for Inter 
prediction (motion estimation) consists of choosing motion data comprising the reference picture and a spatial 
displacement that is applied to all samples of the macroblock. The motion data which are transmitted as side 
information are used by encoder and decoder to simultaneously provide the inter prediction signal. 

The residual of the prediction (either Intra or Inter) which is the difference between the original and the predicted 
macroblock is transformed. The transform coefficients are scaled and quantized. The quantized transform coefficients 
are entropy coded and transmitted together with the side information for either Intra-frame or Inter-frame prediction. 

The encoder contains the decoder to conduct prediction for the next blocks or next picture. Therefore, the quantized 
transform coefficients are inverse scaled and inverse transformed in the same way as at the decoder side resulting in the 
decoded prediction residual. The decoded prediction residual is added to the prediction. The result of that addition is fed 
into a deblocking filter which provides the decoded video as its output. 
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The new features of H.264/AVC compared to MPEG-2 Video are listed as follows: variable block-size motion 
compensation with small block sizes from 16x16 luma samples down to 4x4 luma samples per block, 
quarter-sample-accurate motion compensation, motion vectors pointing over picture boundaries, multiple reference 
picture motion compensation, decoupling of referencing order from display order, decoupling of picture representation 
methods from picture referencing capability, weighted prediction, improved "skipped" and "direct" motion inference, 
directional spatial prediction for intra coding, in-the-loop deblocking filtering, 4x4 block-size transform, hierarchical 
block transform, short word-length/exact-match inverse transform, context-adaptive binary arithmetic entropy coding, 
flexible slice size, FMO, ASO, redundant pictures, data partitioning, SP/SI synchronization/switching pictures. 

A.3.1 .4 Explanation of H.264/AVC profiles and levels 

Profiles and levels specify conformance points. These conformance points are designed to facilitate interoperability 
between various applications of the standard that have similar functional requirements. A profile specifies a set of 
coding tools or algorithms that can be used in generating a conforming bit-stream, whereas a level places constraints on 
certain key parameters of the bitstream. All decoders conforming to a specific profile must support all features in that 
profile. Encoders are not required to make use of any particular set of features supported in a profile but have to provide 
conforming bitstreams, i.e. bitstreams that can be decoded by conforming decoders. 

The first version of H.264/AVC was published in May 2003 by ITU-T as Recommendation H.264 [1] and by ISO/IEC 
as 14496-10 [1]. Three Profiles define sub-sets of the syntax and semantics: 

• Baseline Profile. 

• Extended Profile. 

• Main Profile. 

The Fidelity Range Extensions Amendment of H.264/ A VC, agreed in July 2004, added some additional tools and 
defined four new Profiles (of which only the first is relevant for the present document): 

• High Profile. 

• High 10 Profile. 

• High 4:2:2 Profile. 

• High 4:4:4 Profile. 

The relationship between High Profile and the original three Profiles, in terms of the major tools from the toolbox that 

may be used, is illustrated by Figure A. 7. 
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Figure A.7: Relationship between high profile and the three original profiles 

The present document only uses Baseline, Main, and High Profile. These contain the following features: 

Baseline Profile: 

The Baseline Profile contains the following restricted set of coding features. 

• I and P Slices: Intra coding of macroblocks through the use of I slices; P slices add the option of Inter coding 
using one temporal prediction signal. 

• 4x4 Transform: The prediction residual is transformed and quantized using 4x4 blocks. 

• CAVLC: The symbols of the coder (e.g. quantized transform coefficients, intra predictors, motion vectors) are 
entropy-coded using a variable length code. 

• FMO: This feature of Baseline allowing arbitrary sampling of the Macroblocks within a slice is not used in the 
present document. The main reason is to achieve decodability by Main or High profile decoders, which is 
signalled by constrained_setl_fIag being equal to 1. 

• ASO: This feature of Baseline allowing arbitrary order of slices within a picture is not used in the present 
document. The main reason is to achieve decodability by Main or High profile decoders, which is signalled by 
constrained_setl_flag being equal to 1. 

• Redundant Slices: This feature of Baseline allowing transmission of a redundant slices that approximates the 
primary slice is not used in the present document. The main reason is to achieve decodability by Main of High 
profile decoders, which is signalled by constrained_setl_flag being equal to 1. 



ETSI 



32 



ETSI TS 102 005 V1.3.1 (2007-07) 



Main Profile: 

Except for FMO, ASO, and Redundant Slices, Main Profile contains all features of Baseline Profile and the following 
additional ones: 

• B Slices: Enhanced Inter coding using up to two temporal prediction signals that are superimposed for the 
predicted block. 

• Weighted Prediction: Allowing the temporal prediction signal in P and B slices to be weighted by a factor. 

• CAB AC: An alternative entropy coding to CAVLC providing higher coding efficiency at higher complexity, 
which is based on context-adaptive binary arithmetic coding. 

High Profile: 

High Profile contains all features of Main Profile and the following additional ones: 

• 8x8 Transform: In addition to the 4x4 Transform, the encoder can choose to code the prediction residual using 
a, 8x8 Transform. 

• Quantization Matrix: The encoder can choose to apply weights to the transform coefficients, which provides a 
weighted fidelity of reproduction for these. 

A.3.1 .5 Summary of key tools and parameter ranges for capability A to E 
IRDs 

Table A. 1 summarizes the assignment of profiles and levels to the five IP-IRDs that are specified in the present 
document. 

Table A.1 



Capability 


Mandatory 
profile 


Optional 
profile 


Additional 

constraint 

on mandatory profile 


Level 


Max frame size 
(macro-blocks) 


Example video 
formats 


Max 

bit 

rate 

(kbit/s) 


A 


Baseline 


Main or 
High 


constraint_set1_flag = 1 


1b 


99 


176x144, 15Hz 


128 


B 


Baseline 


Main or 
High 


constraint_set1_flag = 1 


1.2 


396 


352x288, 15Hz 

QCIF = 176x144, 

30Hz 


384 


C 


Baseline 


Main or 
High 


constraint_set1_flag = 1 


2 


396 


GIF = 352x288, 30Hz 


2 000 


D 


IVIain 


High 


none 


3 


1 620 


625 SD = 720 x 576, 

25Hz 
525 SD = 720 x 480, 

30Hz 


10 000 


E 


High 


- 


none 


4 


8 192 


1 080i HD = 1 920 x 

1 080, 25/30HZ 

720p HD = 1 280 X 

720, 50/60HZ 


20 000 



The following should be noted. 

IP-IRDs with Capability A, B, and C specify the Baseline profile with the additional constraint that constraint_setl_flag 
must be set equal to 1 making these bitstreams also decodable by Main or High profile decoders. The reason for this 
additional constraint is that our investigations have shown that the features that are contained in Baseline but are not 
contained in Main profile (FMO, ASO, and redundant pictures) and are disabled by setting constraint_setl_flag equal 
to 1 do not provide any benefit at the packet error rates envisioned to be typical for the applications in which the present 
document will be used. IP-IRDs with capability D must be conforming to Main profile without any additional 
constraints. IP-IRDs with capability E must be conforming to Main profile without any additional constraints. 
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Because of the additional constraint and the requirements in H.264/AVC, IP-IRDs labelled with a particular capability 
Y are capable of decoding and rendering pictures that can be decoded by IP-IRDs labelled with a particular capability X 
with X being an earlier letter than Y in the alphabet. For instance, Capability D IP-IRDs are capable of decoding 
bitstreams conforming to Main Profile at level 3 of H.264/AVC and below. Additionally, Capability D IP-IRDs are 
capable of decoding bitstreams that are also decodable by IP-IRDs with capabilities A, B, or C. 

In addition to the mandatory requirements on IP-IRDs and Bitstreams, the optional use of the following Bitstreams is 
allowed given that the IP-IRD is capable of decoding it. For Capability A, B, and C Bitstreams, encoders may 
optionally generate Main or High Profile bitstreams. For Capability D Bitstreams, encoders may optionally generate 
High Profile bitstreams. 

Each level specifies a maximum number of macroblocks per second that can be processed by a corresponding decoder 
(not explicitly listed in the table). Additionally, the maximum number of macroblocks per frame is restricted as well. 
For example, for the Capability D IP-IRD, the maximum number of macroblocks per frame is given as 1 620 
corresponding to a 625 SD picture (level 3 of H.264/AVC). Together with the maximum number of macroblocks per 
second that can be processed which are given as 40 500, the maximum frame rate is given as 25 frames per second. 
Please note that this also permits the processing of 525 SD pictures at 30 frames per second. 



A.3. 1 .6 Other video parameters 



The present document is supposed to cover a large variety of applications. Therefore, we do not specify parameters such 
as frame rate, aspect ratio, chromaticity, chroma, and random access points as restrictively as they are specified in 
TS 101 154 [7]. 

For parameters such as frame rate and aspect ratio, the constraints as specified in H.264/AVC are sufficient and need no 
further adjustment. It is only recommended to avoid extreme values. 

For parameters such as chromaticity and chroma, it is recommended to utilize the parameters that are specified in the 
VUI of H.264/AVC which is part of the sequence parameter set. 

Random access points are provided through so-called Instantaneous Decoding Refresh (IDR) pictures. In our 
recommendations, we distinguish broadcast and other applications. For broadcast applications it is recommended that 
sequence and picture parameter sets are sent together with a random access point (e.g. an IDR picture) to be encoded at 
least once every 500 ms. For multicast or streaming applications a maximum interval of 5 s between random access 
points should not be exceeded. 

A.3.2 VC-1 video 
A.3.2.1 Overview 

The VC-1 bit stream is defined as a hierarchy of layers. This is conceptually similar to the notion of a protocol stack of 
networking protocols. The outermost layer is called the sequence layer. The other layers are entry-point, picture, slice, 
macroblock and block. In the Simple and Main profiles, a sequence in the sequence layer consists of a series of one or 
more coded pictures. In the Advanced profile, a sequence consists of one or more entry-point segments, where each 
entry -point segment consists of a series of one or more pictures, and where the first picture in each entry-point segment 
provides random access. 

In the VC-1 Advanced Profile, the sequence layer header contains the parameters required to initialize the VC-1 
decoder. These parameters apply to all entry-point segments until the next occurrence of a sequence layer header in the 
coded bit stream. For Simple and Main Profiles, the decoder initialization parameters are conveyed as Decoder 
Initialization Metadata structures (see annex J of SMPTE 421M [17]) carried in the SDP datagrams signalling the 
VC-1 -based session., rather than via a sequence layer header and an entry -point segment header. Therefore, all IP IRDs 
supporting VC-1 must be capable of extracting this data from the SDP datagrams. 
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A.3.2.2 Explanation of VC-1 profiles and levels 

As with MPEG-2 and H.264/AVC, Profiles and Levels are used to specify conformance points for VC-1. A profile 
defines a sub-set of the VC-1 standard which include a specific set of coding tools and syntax. A level is a defined set of 
constraints on the values which can be taken by key parameters (such as bit rate or video resolution) within a particular 
profile. A decoder claiming conformance to a specific profile must support all features in that profile. Encoders are not 
required to make use of any particular set of features supported in a profile but have to provide conforming bitstreams, 
i.e. bitstreams that can be decoded by conforming decoders. 

Three profiles have been specified: Simple, Main and Advanced. For each profile a number of levels have been defined: 
two levels with Simple Profile, three levels with Main Profile and five levels with Advanced Profile. Note that VC-1 
levels have been defined to be specific to particular profiles; this is in contrast with MPEG-2 and H.264/AVC where 
levels are largely independent of profiles. 

Table A. 2 summarizes the coding tools that are included in each profile. 

Table A.2 



Feature 


Simple 
Profile 


Main Profile 


Advanced 
Profile 


Baseline intra frame compression 


• 


• 


Y 


Variable-sized transform 


y 


^ 


• 


16-bit transform 


^ 


• 


^ 


Overlapped transform 


• 


^ 


V 


4 motion vector per macroblock 


• 


^ 


• 


V4 pixel luminance motion compensation 


• 


• 


^ 


Va pixel chrominance motion compensation 




^ 


V 


Start codes 




• 


•/ 


Extended motion vectors 




^ 


V 


Loop filter 




^ 


V 


Dynamic resolution change 




• 


•/ 


Adaptive macroblock quantization 




^ 


V 


B frames 




• 


• 


Intensity compensation 




^ 


• 


Range adjustment 




^ 


• 


Field and frame coding modes 






• 


GOP Layer 






• 


Display metadata 






• 



The Advanced Profile bitstream includes a number of fields which provide information useful to the post-decode 
display process. This information, collectively known as "display metadata" is output by the decoding process. Its use in 
the display process is optional, but recommended. 

A.3.2.3 Summary of key tools and parameter ranges for capability A to E 
IRDs 

Five combinations of profile and level have been defined in the present document as VC-1 IP -IRDs with Capability A 
to E. The combinations of VC-1 profile and level for each of the five Capabilities have been chosen to facilitate the 
design of an IP-IRD that has the computational resource required to support both H.264/AVC and VC-1 at the same 
Capability. However, the differences between the two standards mean that this alignment cannot be guaranteed. 

Table A. 3 summarizes the assignment of profiles and levels to the five IP-IRDs that are specified in the present 
document. 
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Table A.3 



Capability 


Profile 


Level 


Max frame size 
(macroblocks) 


Example Video Formats 


Max bit rate 
(kbit/s) 


A 


Simple 


LL 


99 


176x144, 15 Hz 


96 


B 


Simple 


ML 


396 


352x288, 15 Hz 

320 X 240, 24 Hz 

QCIF = 176x144, 30 Hz 


384 


C 


Advanced 


LO 


396 


GIF = 352 X 288, 30 Hz 


2 000 


D 


Advanced 


L1 


1,620 


625 SD = 720 x 576, 25 Hz 
525 SD = 720 x 480, 30 Hz 


10,000 


E 


Advanced 


L3 


8,192 


1 080i HD = 1 920 X 1 080, 25/30 Hz 
720p HD = 1280 X 720, 50/60 Hz 


45,000 



Note that IP-IRDs labelled with a particular capability Y are capable of decoding and rendering pictures that can be 
decoded by IP-IRDs labelled with a particular capability X with X being an earlier letter than Y in the alphabet. For 
instance, Capability D IP-IRDs are capable of decoding bitstreams conforming to Advanced Profile at LI of VC-1 and 
below. Additionally, Capability D IP-IRDs are capable of decoding bitstreams that are also decodable by IP-IRDs with 
capabilities A, B, or C. 



A.4 Audio 



A.4.1 MPEG-4 high efficiency AAC v2 (HE AAC v2) 

The principle problem of traditional perceptual audio codecs at low bit rates is, that they would need more bits to 
encode the whole spectrum accurately than available. The results are either coding artefacts or the transmission of a 
reduced bandwidth audio signal. To resolve this problem, MPEG decided to add a bandwidth extension technology as a 
new tool to the MPEG-4 audio toolbox. With SBR the higher frequency components of the audio signal are 
reconstructed at the decoder based on transposition and additional helper information. This method allows an accurate 
reproduction of the higher frequency components with a much higher coding efficiency compared to a traditional 
perceptual audio codec. Within MPEG the resulting audio codec is called MPEG-4 High Efficiency AAC (HE AAC) 
and is the combination of the MPEG-4 Audio Object Types AAC-Low Complexity (LC) and Spectral Band 
Replication (SBR). It is not a replacement for AAC, but rather a superset which extends the reach of high-quality 
MPEG-4 Audio to much lower bitrates. HE AAC decoders will decode both, plain AAC and the enhanced AAC plus 
SBR. The result is a backward compatible extension of the standard. 

The basic idea behind SBR is the observation that usually a strong correlation between the characteristics of the high 
frequency range of a signal (further referred to as "highband") and the characteristics of the low frequency range 
(further referred to as "lowband") of the same signal is present. Thus, a good approximation of the representation of the 
original input signal highband can be achieved by a transposition from the lowband to the highband. In addition to the 
transposition, the reconstruction of the highband incorporates shaping of the spectral envelope. This process is 
controlled by transmission of the highband spectral envelope of the original input signal. Additional guidance 
information for the transposing process is sent from the encoder, which controls means, such as inverse filtering, noise 
and sine addition. This transmitted side information is further referred to as SBR data. 

In June 2004 MPEG extended its toolbox with the Audio Object Type Parametric Stereo (PS), which enables stereo 
coding at very low bitrates. The principle behind the PS tool is to transmit a mono signal coded in HE AAC format 
together with a description of the stereo image. The PS tool is used at bit rates in the low bit rate range. The resulting 
MPEG profile is called MPEG-4 HE AAC v2. Figure A.8 shows the different MPEG tools used in the MPEG-4 
HE AAC v2 profile. A HE AAC v2 decoder will decode all three profile, AAC-LC, HE AAC and HE AAC v2. 
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AAC-LC 



SBR 



Parametric 
Stereo 



MPEG-4 High Efficiency AAC 



MPEG-4 High Efficiency AAC v2 



Figure A.8: MPEG tools used in the HE AAC v2 profile 

Figure A.9 shows a block diagram of a HE AAC v2 Encoder. At the lowest bitrates the PS tool is used. At higher 
bitrates, normal stereo operation is performed. The PS encoding tool estimates the parameters characterizing the 
perceived stereo image of the input signal. These parameters are embedded in the SBR data. If the PS tool is used, a 
stereo to mono downmix of the input signal is applied, which is then fed into the aacPlus encoder operating in mono. 
SBR data is embedded into the AAC bitstream by means of the extension_payload() element Two types of SBR 
extension data can be signalled through the extension_type field of the extension_payload(). For compatibility reasons 
with existing AAC only decoders, two different methods for signalling the existence of an SBR payload can be selected. 
Both methods are described below. 
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Figure A.9: HE AAC v2 encoder 

The HE AAC v2 decoder is depicted in Figure A. 10. The coded audio stream is fed into a demultiplexing unit prior to 
the AAC decoder and the SBR decoder. The AAC decoder reproduces the lower frequency part of the audio spectrum. 
The time domain output signal from the underlying AAC decoder at the sampling rate fSp^p^Q is first fed into a 
32 channel Quadrature Mirror Filter (QMF) analysis filterbank. Secondly, the high frequency generator module 
recreates the highband by patching QMF subbands from the existing low band to the high band. Furthermore, inverse 
filtering is applied on a per QMF subband basis, based on the control data obtained from the bitstream. The envelope 
adjuster modifies the spectral envelope of the regenerated highband, and adds additional components such as noise and 
sinusoids, all according to the control data in the bitstream. In case of a stream using Parametric Stereo, the mono 
output signal from the underlying HE AAC decoder is converted into a stereo signal. This processing is carried out in 
the QMF domain and is controlled by the Parametric Stereo parameters embedded in the SBR data. Finally a 64 channel 
QMF synthesis filterbank is applied to retain a time-domain output signal at twice the sampling rate, 

i.e. fSpm = fssBR = 2 X fs^AC- 
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Figure A.10: HE AAC v2 decoder 

A.4.1 .1 HE AAC v2 Levels and Main Parameters for DVB 

MPEG-4 provides a huge toolset for the coding of audio objects. In order to allow effective implementations of the 
standard, subsets of this toolset have been identified that can be used for specific applications. The function of these 
subsets, called "Profiles," is to limit the toolset a conforming decoder must implement. For each of these Profiles, one or 
more Levels have been specified, thus restricting the computational complexity. 

The HE AAC v2 Profile is introduced as a superset of the AAC Profile. Besides the Audio Object Type (AOT) 
AAC LC (which is present in the AAC Profile), it includes the AOT SBR and the AOT PS. Levels are introduced 
within these Profiles in such a way, that a decoder supporting the HE AAC v2 Profile at a given level can decode an 
AAC Profile and an HE AAC Profile stream at the same or lower level. 

Table A.4: Levels within the HE AAC v2 Profile 



Level 


Max. 
channels/object 


Max. AAC sampling 

rate, SBR not present 

(kHz) 


Max. AAC 

sampling rate, 

SBR present 

(kHz) 


Max. SBR sampling rate, 
(kHz) (in/out) 


1 


NA 


NA 


NA 


NA 


2 


2 


48 


24 


24/48 
(see note 1) 


3 


2 


48 


48 
(see note 3) 


48/48 
(see note 2) 


4 


5 


48 


24/48 
(see note 4) 


48/48 
(see note 2) 


5 


5 


96 


48 


48/96 


NOTE 1 : A level 2 HE-AAC v2 Profile decoder implements the baseline version of the parametric stereo 

tool. Higher level decoders are not be limited to the baseline version of the parametric stereo tool. 

NOTE 2: For Level 3 and Level 4 decoders, it is mandatory to operate SBR in a downsampled mode if the 
sampling rate of the AAC core is higher than 24 l<Hz. Hence, if SBR operates on a 48 l<Hz AAC 
signal, the internal sampling rate of SBR will be 96 kHz, however, the output signal will be 
downsampled by SBR to 48 kHz. 

NOTE 3: If Parametric Stereo data is present the maximum AAC sampling rate is 24 kHz, if Parametric 
stereo data is not present the maximum AAC sampling rate is 48 kHz. 

NOTE 4: For one or two channels the maximum AAC sampling rate, with SBR present, is 48 kHz. For more 
than two channels the maximum AAC sampling rate, with SBR present, is 24 kHz. 



For DVB the level 2 for mono and stereo as well as the level 4 multichannel audio signals are supported. The Low 
Frequency Enhancement channel of a 5. 1 audio signal is included in the level 4 definition of the number of channels. 
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A.4.1 .2 Methods for signalling of SBR and/or PS 

In case of usage of SBR and/or PS several ways how to signal the presence of SBR and/or PS data are possible [2]. 
Within the context of DVB services over IP it is recommended to use backward compatible explicit signalling. Here the 
respective extension Audio Object Type is signalled at the end of the AudioSpecificConfig(). 

A.4.2 Extended AMR-WB (AMR-WB+) 

The AMR-WB+ audio codec can encode mono and stereo, up to 48 kbit/s for stereo. It supports also downmixing to 
mono at a decoder. The AMR-WB+ codec has been fully specified in TS 126 290 [12] including error concealment. The 
source code for both encoder and decoder has been fully specified in TS 126 304 [15] and in TS 126 273 [14]. The 
transport has been specified in RFC 4352 [13]. 

Figure A.l 1 presents the AMR-WB+ encoder structure. The input signal is separated in two bands. The first band is the 
low-frequency (LF) signal, which is critically sampled at Fs/2. The second band is the high-frequency (HF) signal, 
which is also downsampled to obtain a critically sampled signal. The LF and HF signals are then encoded using two 
different approaches: the LF signal is encoded and decoded using the "core" encoder/decoder, based on switched 
ACELP and Transform Coded eXcitation (TCX). In ACELP mode, the standard AMR-WB codec is used. The HF 
signal is encoded with relatively few bits using a BandWidth Extension (BWE) method. 

The parameters transmitted from encoder to decoder are the mode selection bits, the LF parameters and the HF 
parameters. The codec operates in superframes of 1 024-samples. The parameters for each of them are decomposed into 
four packets of identical size. 

When the input signal is stereo, the left and right channels are combined into mono signal for ACELP/TCX encoding, 
whereas the stereo encoding receives both input channels. 

Figure A. 12 presents the AMR-WBh- decoder structure. The LF and HF bands are decoded separately after which they 
are combined in a synthesis filterbank. If the output is restricted to mono only, the stereo parameters are omitted and the 
decoder operates in mono mode. 
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Figure A.11 : High-level structure of AMR-WB+ encoder 
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Figure A.12: High-level structure of AMR-WB+ decoder 

A.4.2.1 Main AMR-WB-i- parameters for DVB 

The AMR-WB+ codec has been designed for mobile appHcations. Therefore no additional restrictions are required for 
IPDC over DVB-H or other DVB apphcations. 

A.4.3 AC-3 

The AC-3 digital compression algorithm can encode from 1 to 5.1 channels of source audio from a PCM representation 
into a serial bit stream at data rates ranging from 32 kbit/s to 640 kbit/s. The 0.1 channel refers to a fractional bandwidth 
channel intended to convey only low frequency signals. 



£75/ 



40 



ETSI TS 102 005 V1.3.1 (2007-07) 



The AC-3 algorithm achieves high coding gain by coarsely quantizing a frequency domain representation of the audio 
signal. A block diagram of this process is shown in Figure A. 13. The first step in the encoding process is to transform 
the representation of audio from a sequence of PCM time samples into a sequence of blocks of frequency coefficients. 
This is done in the analysis filter bank. Overlapping blocks of 512 time samples are multiplied by a time window and 
transformed into the frequency domain. Due to the overlapping blocks, each PCM input sample is represented in two 
sequential transformed blocks. The frequency domain representation may then be decimated by a factor of two so that 
each block contains 256 frequency coefficients. The individual frequency coefficients are represented in binary 
exponential notation as a binary exponent and a mantissa. The set of exponents is encoded into a coarse representation 
of the signal spectrum which is referred to as the spectral envelope. This spectral envelope is used by the core bit 
allocation routine which determines how many bits to use to encode each individual mantissa. The spectral envelope 
and the coarsely quantized mantissas for 6 audio blocks (1 536 audio samples per channel) are formatted into an AC-3 
frame. The AC-3 bit stream is a sequence of AC-3 frames. 
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Figure A.13: The AC-3 encoder 

The actual AC-3 encoder is more complex than indicated in Figure A.13. The following functions not shown above are 
also included: 

1) A frame header is attached which contains information (bit rate, sample rate, number of encoded channels, 
etc.) required to synchronize to and decode the encoded bit stream. 

2) Error detection codes are inserted in order to allow the decoder to verify that a received frame of data is error 
free. 

3) The analysis filterbank spectral resolution may be dynamically altered so as to better match the time/frequency 
characteristic of each audio block. 

4) The spectral envelope may be encoded with variable time/frequency resolution. 

5) A more complex bit allocation may be performed, and parameters of the core bit allocation routine modified so 
as to produce a more optimum bit allocation. 

6) The channels may be coupled together at high frequencies in order to achieve higher coding gain for operation 
at lower bit rates. 

7) In the two-channel mode, a rematrixing process may be selectively performed in order to provide additional 
coding gain, and to allow improved results to be obtained in the event that the two-channel signal is decoded 
with a matrix surround decoder. 
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The decoding process is basically the inverse of the encoding process. The decoder, shown in Figure A. 14, must 
synchronize to the encoded bit stream, check for errors, and de-format the various types of data such as the encoded 
spectral envelope and the quantized mantissas. The bit allocation routine is run and the results used to unpack and 
de-quantize the mantissas. The spectral envelope is decoded to produce the exponents. The exponents and mantissas are 
transformed back into the time domain to produce the decoded PCM time samples. 
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Figure A.14: The AC-3 decoder 

The actual AC-3 decoder is more complex than indicated in Figure A.14. The following decoder operations not shown 
above are included: 

1) Error concealment or muting may be applied in case a data error is detected. 

2) Channels which have had their high-frequency content coupled together must be de-coupled. 

3) Dematrixing must be applied (in the 2-channel mode) whenever the channels have been rematrixed. 

4) The synthesis filterbank resolution must be dynamically altered in the same manner as the encoder analysis 
filter bank had been during the encoding process. 

A.4.4 Enhanced AC-3 

Enhanced AC-3 is an evolution of the AC-3 coding system. The addition of a number of low data rate coding tools 
enables use of Enhanced AC-3 at a lower bit rate than AC-3 for high quality, and use at much lower bit rates than AC-3 
for medium quality. A greatly expanded and more flexible bitstream syntax enables a number of advanced features, 
including expanded data rate flexibility and support for variable bitrate (VBR) coding. A bitstream structure based on 
substreams allows delivery of programs containing more than 5. 1 channels of audio to support next-generation content 
formats, supporting channel configuration standards developed for D-Cinema and support for multiple audio programs 
carried within a single bit-stream, suitable for deployment of services such as Hearing Impaired/Visual Impaired. To 
control the combination of audio programs carried in separate substreams or bitstreams. Enhanced AC-3 includes 
comprehensive mixing metadata, enabling a content creator to control the mixing of two audio streams in an IP-IRD. To 
ensure compatibility of the most complex bitstream configuration with even the simplest Enhanced AC-3 decoder, the 
bitstream structure is hierarchical - decoders will accept any Enhanced AC-3 bitstream and will extract only the portions 
that are supported by that decoder without requiring additional processing. To address the need to connect IP-IRDs that 
include Enhanced AC-3 to the millions of home theatre systems that feature legacy AC-3 decoders via S/PDIF, it is 
possible to perform a modest complexity conversion of an Enhanced AC-3 bitstream to an AC-3 stream for S/PDIF 
compatibility. 

Enhanced AC-3 includes the following coding tools that improve coding efficiency when compared to AC-3. 

• Spectral Extension: recreates a signal's high frequency amplitude spectrum from side data transmitted in the bit 
stream. This tool offers improvements in reproduction of high frequency signal content at low data rates. 
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Transient Pre-Noise Processing: synthesizes a clause of PCM data just prior to a transient. This feature 
improves low data rate performance for transient signals. 

Adaptive Hybrid Transform Processing: improves coding efficiency and quality by increasing the length of the 
transform. This feature improves low data rate performance for signals with primarily tonal content. 

Enhanced Coupling: improves on traditional coupling techniques by allowing the technique to be used at lower 
frequencies than conventional coupling, thus increasing coder efficiency. 



A.5 The DVB IP datacast application 

Annex B of the present document defines application-specific constraints on the use of the toolbox for the particular 
case of DVB IP Datacast applications. These applications are mainly focused on handheld devices with severe 
limitations on computational resources and battery. Hence, the allowed values of parameters such as the picture size are 
limited. In addition, the desire to harmonize the such applications with 3GPP specifications has led to a strong 
recommendation that each IP-IRD that is to be used for DVB IP Datacast applications is capable of decoding video 
bitstreams conforming to H.264/AVC [1]. 



A.6 Future work 



In common with TS 101 154 [7] and TS 102 154 [8], the present document is a living document, subject to periodic 
revision. The intention is to develop revisions in a largely backwards compatible manner, so that no changes to the 
mandatory functionality of a previously defined IP-IRD are made between one edition and the next. 

One specific issue is the possibility of extending the video specification to include even higher resolution content, such 
as 1 080 p at 50 Hz and 60 Hz frame rate. If this is done, it is likely that H.264/AVC High Profile at Level 4.2 and VC-1 
Advanced Profile at Level L4 would be chosen. 
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Annex B (normative): 

TS 102 005 usage in DVB IP datacast 

B.1 Scope 

This annex describes the usage of TS 102 005 in TS 102 468 through specifying additional constraints that apply to the 
specifications in clauses 1 to 6 of the present document. 



B.2 Introduction 



This annex contains the technical specifications that address the requirements for DVB IP Datacast applications. These 
are mainly focused on handheld devices with severe limitations on computational resources and battery. Hence, the 
allowed values of parameters such as the picture size are limited. Nevertheless, IP-lRDs permitting larger spatial video 
resolutions may also be used in DVB IP Datacast applications. 

Conversely, it is not mandatory for IP datacast services which do not conform to TS 102 468 to follow the additional 
constraints specified in this annex. 



B.3 Systems layer 



This clause specifies constraints on the RTP payload formats, 3GPP file format, and "MP4" file format that are to be 
used for DVB IP Datacast applications. 

B.3.1 Transport over IP networks/RTP packetization formats 

The specifications in clause 4.1, including its constituent clauses shall apply subject to the following further constraint 
on clause 4.1.1 for the RTP Packetization of H.264/AVC for DVB IP Datacast applications and on clause 4.1.3 for the 
RTP Packetization of HE AAC v2 for DVB IP Datacast applications. 

B.3.1 .1 Further constraint to RTP packetizations of H.264/AVC 

Encoding: The Single NAL Unit Mode or the Non-Interleaved Mode of RFC 3984 [5] shall be used for the 

packetization of H.264/AVC data into RTP. 

Decoding: Each IP-IRD supporting H.264/AVC shall be able to receive Single NAL Unit Mode and 

Non-Interleaved Mode RTP packets with H.264/AVC data as defined in RFC 3984 [5]. 

B.3.1 .2 Further constraint to RTP Packetizations of HE AAC v2 

Encoding: The interleaving mode of RFC 3640 [4] shall not be used for the packetization of HE AAC v2 data 

into RTP. 

Decoding: An IP IRD supporting HE AAC v2 shall be able to decode non-interleaved access units of 

RFC 3640 [4]. 

B.3. 2 File storage for download services 

The specifications in clause 4.2 including its constituent clauses, shall apply. 
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B.4 Video 

This clause specifies constraints on the video encoding, decoding and rendering for DVB IP Datacast appHcations. 

It is strongly recommended that each IP-IRD that is to be used for DVB IP Datacast applications is capable of decoding 
video bitstreams conforming to H.264/AVC as specified in [1]. IP-IRDs that are used for DVB IP Datacast applications 
may be capable of decoding video bitstreams conforming to VC-1 as specified in [17]. Encoded video bitstreams for 
DVB IP Datacast applications shall conform to either H.264/AVC or VC-1. 

Clause B.4.1 defines the constraints for encoding and decoding with H.264/AVC, whilst clause B.4.2 defines the 
constraints for encoding and decoding with VC-1. 

B.4.1 H.264/AVC 
B.4. 1.1 Profile and level 

Encoding: For all Capability Bitstreams except Capability C Bitstreams the specifications in clause 5.1.1 

shall apply. 

Capability C Bitstreams RTF packetized for real-time delivery shall conform to the restrictions 
described in ITU-T Recommendation H. 264 \ ISO/IEC 14496-10 for Level 1.3 of the Baseline 
Frofile with constraint_setl _flag being equal to 1. 

Capability C Bitstreams encapsulated in 3 GFF file format or in "MF4" file format shall conform 
to the restrictions described in ITU-T Recommendation H.264/ISO/IEC 14496-10 for Level 2 of 
the Baseline Frofile with constraint_setl _flag being equal to 1. 

Decoding: For all Capability IF-IRDs, the specifications in clause 5.1.1 shall apply in terms of the signalling 

of Frofile and Level. However, it should be noted that IP-IRDs used for DVB IP Datacast 
applications are only required to be capable of decoding and rendering pictures from bitstreams 
that are subject to the additional constraints in terms of Sample Aspect Ratio, Frame Rate, 
Luminance Resolution and Picture Aspect Ratio that are specified in clauses B.4. 1.2 and B.4. 1.3. 

B.4.1 .2 Sample aspect ratio 

Encoding: Square (1:1) sample aspect ratio shall be used. 

Decoding: Each IF-IRD supporting H.264/AVC shall support decoding and rendering pictures with square 

(1:1) sample aspect ratio. 

B.4.1 .3 Frame rate, luminance resolution, and picture aspect ratio 

The specifications on frame rate in clause 5.1.3, picture aspect ratio in clause 5.1.4, and luminance resolution in 
clause 5.1.5 are further constrained as follows. 

Encoding: One of the picture sizes listed in Table B.l shall be used for the indicated capability class. The 

video frame rate shall not exceed the maximum frame rate specified for the picture size in the 
indicated capability class. The picture size shall not change during a streaming delivery session. 

Decoding: Each IF-IRD supporting H.264/AVC shall support decoding and rendering video encoded using 

the picture sizes and video frame rates indicated in the Table B.l. Additionally, lower frame rates 
and variable frame rates shall be supported. 
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Table B.1 : H.264/AVC pictures sizes for DVB IP datacast applications 



Capability 
class 


Horizontal 
resolution 
(samples) 


Vertical 
resolution 
(samples) 


Maximum 

frame 

rate (f/s) 


Display 
Aspect ratio 


A 


176 


144 


15 


1.22:1 


A 


128 


96 


30 


4:3(1.33:1) 


A 


144 


80 


30 


16:9(1.80:1) 


B 


176 


144 


30 


1.22:1 


B 


320 


240 


15 


4:3(1.33:1) 


B 


320 


176 


15 


16:9(1.82:1) 


C 


320 


240 


30 


4:3(1.33:1) 


C 


320 


176 


30 


16:9(1.82:1) 


C 


400 


224 


30 


16:9(1.79:1) 



B.4.1.4 Chromaticity 

The specifications in clause 5.1.6 shall apply. 

B.4.1.5 Chrominance format 

The specifications in clause 5.1.7 shall apply. 

B.4.1 .6 Random access points 

Encoding: A Random Access Point shall be an IDR picture. Unless the sequence parameter set and picture 

parameter set are provided outside the elementary stream, the random access point shall include 
exactly one SPS (that is active), and the PPS that is required for decoding the associated picture. 

B.4.1. 7 Output latency 

Encoding: Each H.264/AVC sequence parameter set shall contain a vui_parameters syntax structure 

including the num_reorder_frames syntax element (indicating maximum number of frames that 
precede any frame in the coded video sequence in decoding order and follow it in output order 
during the streaming delivery session). 

NOTE: For fixed frame applications the num_reorder_frames can be used to compute the maximum decoding to 
output latency in the sequence. 

B.4.2 VC-1 

B. 4.2.1 Profile and level 

The specifications in clause 5.2.1 shall apply in terms of the signalling of Profile and Level. However, it should be 
noted that IP-IRDs used for DVB IP Datacast applications are only required to be capable of decoding and rendering 
pictures from bitstreams that are subject to the additional constraints in terms of bit rate, sample aspect ratio, frame rate, 
luminance resolution and picture aspect ratio that are specified in clauses B.4.2.2, B.4.2. 3 and B.4.2.4. 

B. 4.2.2 Bit rate 

The specifications in clause 5.2.1 are constrained as follows: 

Encoding: The maximum bit rate of a Capability C Bitstream shall not exceed 768 kbit/s. 

Decoding: Each IP-IRD supporting VC-1 shall support any bit rate allowed by the indicated VC-1 Profile 

and Level, subject to a maximum of 768 kbit/s for a Capability C Bitstream. 
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B.4.2.3 Sample aspect ratio 

Encoding: Square (1:1) sample aspect ratio shall be used. 

Decoding: Each IP-IRD supporting VC-1 shall support decoding and rendering pictures with square (1:1) 

sample aspect ratio. 

B.4.2.4 Frame rate, luminance resolution and picture aspect ratio 

The specifications on frame rate in clause 5.2.2, picture aspect ratio in clause 5.2.3, and luminance resolution in 
clause 5.2.4 are further constrained as follows: 



Encoding: 



Decoding: 



One of the picture sizes listed in Table B.2 shall be used for the indicated capability class. The 
video frame rate shall not exceed the maximum frame rate specified for the picture size in the 
indicated capability class. The picture size shall not change during a streaming delivery session. 

Each IP-IRD supporting VC-1 shall support decoding and rendering video encoded using the 
picture sizes and video frame rates indicated in the Table B.2. Additionally, lower frame rates and 
variable frame rates shall be supported. 

Table B.2: VC-1 Pictures sizes for DVB IP Datacast applications 



Capability 
class 


Horizontal 
resolution 
(samples) 


Vertical 
resolution 
(samples) 


Maximum 

frame 

rate (f/s) 


Display 
Aspect ratio 


A 


176 


144 


15 


1.22:1 


A 


128 


96 


30 


4:3(1.33:1) 


A 


144 


80 


30 


16:9(1.80:1) 


B 


176 


144 


30 


1.22:1 


B 


320 


240 


15 


4:3(1.33:1) 


B 


320 


176 


15 


16:9(1.82:1) 


C 


320 


240 


30 


4:3(1.33:1) 


C 


320 


176 


30 


16:9(1.82:1) 


C 


400 


224 


30 


16:9(1.79:1) 



B. 4.2.5 Chromaticity 

The specifications in clause 5.2.5 shall apply. 

B.4.2.6 Random Access Points 

The specifications in clause 5.2.6 shall apply. 



B.5 Audio 



This clause specifies constraints on the audio encoding and decoding for DVB IP Datacast applications. 

Each IP-IRD that is to be used for DVB IP Datacast applications shall be capable of decoding audio bitstreams 
conforming to HE AAC v2 profile as specified in ISO/IEC 14496-3 [2]. In addition, IP-IRDs that are used for DVB IP 
Datacast applications may be capable of decoding audio bitstreams conforming to AMR-WB+ as specified in TS 126 
290 [12]. Encoded audio bitstreams for DVB IP Datacast applications shall conform to either HE AAC v2 or 
AMR-WB-\-. 

Clause B.5.1 defines the constraints for encoding and decoding with HE AAC v2, whilst clause B.5. 2 defines the 
constraints for encoding and decoding with AMR-WB+. 
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B.5.1 HEAACv2 

B.5.1.1 Audio mode 

The specifications in clause 6.1.1 shall apply. 

B.5.1. 2 Profiles 

The specifications in clause 6.1.2 shall apply. 

B.5.1. 3 Bit rate 

The specifications in clause 6.1.3 are constrained as follows: 

Encoding: The maximum bit rate of the encoded audio shall not exceed 192 kbit/s for a stereo pair. For 

Capability A and B bitstreams containing video, the maximum audio bitrate shall not exceed 
128 kbit/s for a stereo pair. The maximum bit rate of the encoded audio shall not exceed 320 kbit/s 
for multi-channel audio 

Decoding: Each IP-IRD supporting HE AAC v2 shall support any bit rate allowed by the HE AAC v2 Profile 

and selected Level, subject to a maximum of 192 kbit/s for a stereo pair. 

B.5.1. 4 Sampling frequency 

The specifications in clause 6.1.4 shall apply. 

B.5.1 .5 Dynamic range control 

The specifications in clause 6.1.5 shall apply. 

B.5.1. 6 Matrix downmix 

The specifications in clause 6.1.6 are constrained as follows: 

Decoding: The support of matrix downmix as defined in MPEG-4 is optional for each IP-IRD. 

B.5.2 AMR-WB+ 

AMR-WB+ encoding and decoding ofAMR-WB+ data in the IP Datacast IP-IRD shall follow the guidelines described 
in clauses 6.2.1 and 6.2.2. 
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